Automatic Generation of X11 Client Programs From Intercepted Communications

James L. Peterson

Advanced Workstations Division
11400 Burnet Road
Austin, Texas 78758

1 Oct 1991

This paper has been published. It should be cited as

James L. Peterson, ``Automatic generation of X11 client programs from intercepted communications'', The X Journal, Volume 1, Number 4, (March-April 1992), pages 44-51.

Table of Contents

2.0Internal Details

1.0 Introduction

The cost of introducing a new computer architecture has grown significantly over the past 30 years. Early computer systems (such as the IBM 7094) could be designed, built, and sold with little or no software support. Over time, however, hardware alone was insufficient. An assembler and loader were necessary before customers would buy a new machine. Soon it was necessary to provide an operating system and then compilers. Users came to expect that software that was available on a previous system would also be available on any new system.

Now no system can be seriously marketed without the concurrent availability of an advanced operating system, compilers for the several most common languages, standard utilities, networking support, and graphical user interfaces. The customer wants a complete working system, not simply a faster CPU or improved hardware architecture.

One result of the need to market a complete system is that many hardware and software projects are developed concurrently, even though there are dependencies among the various parts. For example, we recently found ourselves developing a complex hypertext system written in C for the X11 window system on a Unix operating system at the same time that the compiler, window system, and operating system were still being developed. Among other problems with which we were eventually faced, the performance of the system while under development was poor. We needed to know where to concentrate our efforts to improve performance.

In particular, it seemed possible that the hypertext system itself was not the problem, but that the X11 window system was simply too slow. We needed a way to isolate the computations of our hypertext application from the computations of the window system. Our solution was the development of XPROG, a tool for automatically generating X11 programs which run as fast as X11 clients can run, for a given user session. This allowed us to easily separate the effect of client computation from the X11 server computation. (In our case, although the X11 server did need improvement, our tests also showed that the hypertext application also needed much improvement.)

The solution provided by XPROG is similar to the approach used by Xscope [1]. XPROG sits between an X11 client and an X11 server. The client connects to XPROG as if it were an X11 server. XPROG in turn attaches to a real X11 server. All bytes from the client to XPROG are immediately passed on to the real X11 server, while all bytes from the X11 server are passed on to the real X11 client. Thus, except for a minor increase in communication costs, XPROG is completely invisible to the X11 client and the X11 server. The communication between client and server is exactly the same as if XPROG were not present at all.

Figure 1. XPROG monitors the communication between X11 and a client.

Although XPROG does not affect the communication between the X11 client and the X11 server, it can monitor the communication. From its analysis of the X11 communication, XPROG generates a new program as its output. The output of XPROG is a C program which, when compiled and run, will generate the exact same interaction with the X11 server as the original client did on the monitored sesssion. Thus, the new generated program will replay the exact same display as the original client. However, since the new program does no other computation, it issues the commands to the X11 server as fast as possible. There is no delay due to application computation, and we can see exactly how fast the X11 server really is.

For example, consider an X11 application program that reads a set of line segments that outline the political map of Europe, creates a window, scales the line segments to the window, checks for duplicate line segments, or line segments which are too small for the resolution of the screen and draws the result centered in the window. The output of XPROG for this line drawing program would be mainly a long list of XLIB calls to display a list of line segments, given in absolute pixel coordinates. It would display exactly the same set of line segments as the original application program, but all line segments would be stored in the program as the parameters to XLIB line drawing procedures. Hence there would be no time needed to read the points or to scale or translate them. The XPROG output would drive the X11 server to put up the same map as fast as possible.

Notice also that although the output of XPROG is a program which generates the same display as the original program, it always generates exactly this same display. Since it has no input, it cannot display any other map. Nor will it scale or translate its data if the size of the window changes. The XPROG output only replays exactly the same display that the original program generated. While this may seem limiting, it is exactly the behavior desired in many cases, such as timing the server where a predictable, repeatable input is desired.

This type of program is also useful for demonstration purposes. Often it is useful to be able to generate a display which can be freely distributed or safely exhibited and released to potential customers. The output of XPROG exactly duplicates a previous display, but contains no information about the algorithms or data which generated the display, nor does it contain untested code which might cause a failure at an embarrassing time or as the result of a naive user's actions. The output of XPROG can be freely distributed in either source or compiled form with no proprietary risk.

Figure 2. XPROG generates a new client program.

2.0 Internal Details

The implementation of XPROG is based on the earlier Xscope. The Xscope code is used to establish the communications sockets for XPROG. When a client attaches to XPROG, it immediately attaches to the X11 server, and proceeds to a select wait loop. The program waits for input on one of its input file descriptors (fd) from either the client or the server. When input is available, it is read and immediately written to a corresponding output fd.

In addition, the input buffer is passed to a routine for packetization. The main function of this routine is to break the stream of input buffers into a stream of X11 packets; each packet contains one request, reply, event, or error. Events and errors are always 32 bytes; requests have an early field (bytes 2-3) which contains the request length. Replies also contain a length field (bytes 4-7) defining their length.

Each packet, once completely available in memory, is then interpreted. This is the point where XPROG differs from Xscope. Xscope interprets each packet by printing a dump of its content. XPROG inspects its packet and attempts to generate an XLIB function call that could have generated the packet, printing the source code for that function to its output file.

2.1 Requests

For example, a packet containing a MapWindow command causes XPROG to generate an XMapWindow function call. The MapWindow request consists of a length (8 bytes), an opcode (8), and a 32-bit Window Identifier, specifying which window to map. The XMapWindow function requires two parameters: the display and the window to map. The main problem to be resolved with this back translation, then, is how to generate the parameters needed for the XMapWindow from the values contained in the MapWindow packet.

The display parameter, for XMapWindow and other XLIB commands, specifies the connection to the window server. Typically there is only one connection to the server, so we generate a global variable dpy in the XPROG output which is used for the display parameter for all XLIB functions. The dpy variable is initialized to the result of the XOpenDisplay function in main() of the XPROG output.

    dpy = XOpenDisplay(getenv("DISPLAY"));
    XMapWindow(dpy, .?.);

The window identifier (ID) is one of a large number of object identifiers that a typical client/server interaction involves. X11 uses 32-bit identifiers to refer to windows, fonts, graphic contexts, and many other X11 objects. The identifiers are chosen by the XLIB support code in the client from a range specified by the server when the connection is established. These identifiers are hidden by C language typedefs from XLIB header files. Thus, a client program refers to a window as a variable of type Window. In XPROG, to generate a name for the window variable, we pass the numeric identifier to the IWINDOW() function which returns a string (char *) representing that window.

For XPROG, we use the lower bits of the numeric X11 identifier to specify a hexadecimal number that is then prefixed by an alphabetic type identifier. Thus, the name of the variable for window 1 is W1; the graphics context with identifier 2 is GC2; the font with identifier 10 is Fa, and so on. Each of the 112 types in X11 has its own function which turns an ID of that type into a name.

The names are used in two ways. They are used whenever a request corresponds to an XLIB function which needs a parameter which is specified by an identifier. Thus the code for the MapWindow request is:

unsigned char *buf;
  /* Request MapWindow is opcode 8 */
  char *window = IWINDOW(&buf[4]);

  fprintf(Codeout, "XMapWindow(%s, %s);\n", DISPLAY, window);
which back translates a MapWindow request into C code such as,
XMapWindow(dpy, W8);

This code creates a reference to the specified X11 object, in this case a window whose ID is given starting in byte 4 of the request. For this output to be valid C code, there must be a declaration for the variable name (W8). The first use of an identifier causes XPROG to register identifier, by type, in the XPROG symbol table and a declaration (of the right type) to be generated. Thus all variable names are declared when their first use is generated. For example, the window in the above XMapWindow call would be declared as,

Window W8;

The C language does not, of course, allow intermixed declarations and commands, so the output of XPROG is generated into two files. The first file is used for all declarations. The second file is used for the executable commands. When XPROG terminates, it creates its output by copying first all declarations and then all commands from these two files. This process creates one output file with all declarations and all commands, in the right order.

Some requests, such as CreateWindow or CreatePixmap, are used to define a new identifier to the X11 server. The identifier is defined by the XLIB support code and returned to the application program by the corresponding XLIB function. Thus, the XPROG code corresponding to these requests is an assignment as in,

PX30 = XCreatePixmap(dpy, W12, 13, 13, 1);
which is generated by the XPROG code,
fprintf(Codeout, "\t%s = XCreatePixmap(%s, %s, %d, %d, %d);\n",
	  pixmapId, DISPLAY, drawable, width, height, depth);

Similar techniques can be used with other parameter types. The guidelines for generating parameters for X11 requests are generally:

The use of structures and arrays as parameters from the X11 client to the X11 server requires the most complex code generation. For example, to create a cursor from a pixmap, the CreatePixmapCursor request would be used. This request specifies source and mask pixmaps (identifiers), as well as the location of the origin of the cursor (numbers), and the foreground and background colors for the cursor. The colors are defined as an array or structure of 4 values, three of which specify the red, green, and blue components of the color (RGB). The output from XPROG for such a request is:

   static XColor fore = {0, 0x0000, 0x0000, 0x0000};
   static XColor back = {0, 0xffff, 0xffff, 0xffff};
   CSR7 = XCreatePixmapCursor(dpy, PX3, PX5, &fore, &back, 16, 6);

We use the braces to create a local naming context allowing us to create the two color variables fore and back. fore and back define the foreground and background colors. The values for their definition are taken directly from the CreatePixmapCursor request, and represented as (hexadecimal) numbers. This allows XPROG to generate structures, or arrays as necessary to pass the correct values to an XLIB function.

An additional problem in the back translation occurs when there is no XLIB function which directly corresponds to a given X11 request. This results from extra semantics that have been placed in the XLIB interface which are not a direct reflection of the underlying X11 interface; XLIB is not veneer-less. For example, there is no XLIB function corresponding to the X11 OpenFont request. The closest match is the XLIB XLoadFont function which generates both an OpenFont request and a QueryFont request. This means that the back translation must recognize sequences of requests to generate its output with complete accuracy, or simply accept a less than perfect back translation.

2.2 Replies

Some requests to X11 generate replies from X11. Typically these requests are for some property of an object. For example, the QueryFont request generates a reply containing the properties of the font that is queried. Requests are translated back from X11 requests to XLIB function calls. Replies from the server to the client are simply ignored by XPROG and discarded. Any effect that the reply might have on the session will show up in the arguments passed in future X11 requests. Since the XPROG generated program is simply going to generate the same sequence of commands as the session that generates it, the returned values from the QueryFont request are of no interest to it.

Suppose, for example, that a client program queries the X11 server, using the QueryFont request, for the size of the characters in a font and uses this information to space the characters across the screen. The information in the reply to the QueryFont request will be used to compute the positions of the text to be displayed. XPROG will save and use the resulting computed positions that are given in the text display requests, but has no way to know how to compute these values from the information in the reply message. Thus, returned function values in the XPROG generated output are simply ignored. A QueryFont request would generate:

(void)XQueryFont(dpy, F9);
We could simply discard the QueryFont request completely, but doing so would alter the timing characteristics of the application, in terms of the X11 network protocol.

Other XLIB functions have several values to return, and use call-by-reference parameters to return the additional values. For example, the XGetGeometry function returns 7 values in its parameters as well as returning a function value. We discard all of these values by generating the code:

   Window a;
   int b;
   int c;
   unsigned int d;
   unsigned int e;
   unsigned int f;
   unsigned int g;
   (void)XGetGeometry(dpy, W1, &a, &b, &c, &d, &e, &f, &g);
This code makes the request and receives the reply values into temporary variables on the stack which are then discarded. Any effect of these values on the future behavior of the client program will be contained in the parameters used in future requests, just as with the QueryFont request discussed above.

2.3 Errors

Errors are noted as comments in the XPROG output, but require considerable care in the output program. The correct approach to handling errors would be to determine the requests that caused the error and remove them from the output program. This would require buffering the output program until it is certain that errors do not occur. The other approach, which is more faithful to the original session, would be to define an error handler in the output program which would allow errors to occur, but would ignore them.

Some programs do generate and handle errors. For example, the Andrew Toolkit attempts to open named fonts. If the font does not exist, an error is generated by the X11 server. This error is caught and handled by the Andrew Toolkit, causing it to pick a substitute font for the non-existent one. There are two aspects of this behavior. The first aspect is that the font that is used is the last font opened (the one that does not cause an error), and so the requests which generate errors can be deleted from the output program for correct behavior. The second aspect is that considerations of timing should include all requests, and the subsequent errors. Hence if a large number of error messages are generated, removing them may affect the timing characteristics of the output program.

2.4 Events

Events are the hardest portion of the interaction to generate in the output program. With no attention paid to events, running the XPROG output program typically causes no useful display operation. The output program starts, establishes a connection to the X11 server, creates windows, outputs to the windows and then exits, breaking the connection to the X11 server which then deletes all references to the newly created windows, often before it can even put the windows on the screen. At best, the user sees a quick flicker on the screen as the XPROG output program starts, runs, and exits almost immediately. The first use of an XPROG output program without event handling shows that events must be handled.

Events act to pace an X11 program. The Expose event which results from a MapWindow event, for example, serves to synchronize the client requests with the ability of the server to do something with those requests. The time that it takes for a user to move and click the mouse on a menu also provides the time that the user needs to read the menu, and that ``user interaction'' time provides natural stopping (or at least pausing) points for the program. A typical X11 client will create some initial display and then wait for some user input (keyboard or mouse). The user input will cause it to change its display until it again waits for more user input, and so on. Thus an X11 client alternates between two phases: waiting for user input, and changing the display to react to that input. (This model ignores the fact that ``time'' is often another cause in changing the display, such as in a clock program.)

The result of this consideration is to divide X11 events into two different types: uninteresting events and interesting events. Uninteresting events include such things as mouse movement, enter and exit notification, and focus events. Uninteresting events are simply discarded by XPROG. Interesting events include keyboard and mouse button input (user pacing events), and Expose events (X11 server pacing events). XPROG generates code to wait for interesting events. XPROG generates for each interesting event that it finds in the session communication, a call to a WaitForEvent procedure.

The WaitForEvent procedure has two parameters: an event type to wait for and a prompt message. The prompt message is written to the user console (stderr) to notify the user of the type of message that is wanted (in case the event requires user input such as a mouse or keyboard button press). Then events are received until an event of the desired type arrives. If the event is already in the event queue, there will be no pause. If the event is not in the event queue, all other events are discarded until the event queue is empty. Synchronization events, such as Expose events from MapWindow requests, will typically be in the event queue and require no user action. User pacing events, such as mouse and keyboard input, will prompt with a message and wait for the desired user input.

A pseudo-event is used at the end of the program to prevent the display from disappearing until the user explicitly indicates. This pseudo-event has a prompt message of ``Exit'', but waits for any Button Press.

3.0 Evaluation

The XPROG concept has been (partially) implemented and used to generate output programs. It is a successful aid in program generation. XPROG can easily generate very long and complex programs. In fact the major problem with the output of XPROG is that it tends to be a very long straight-line program. This can stress some compilers (some C compilers are not prepared to accept a 15,000 line straight-line main program), and even for human review, the program could stand to be broken up into more meaningful phases, if these could be automatically recognized.

The XPROG output is fragile code. It needs the same environment to run that the original client program had. The XPROG output cannot generally handle a change of the display device or to a different set of fonts, to the degree that the client program uses features of the environment that might not be present in a different environment. There is no provision in the current code to handle errors. In addition, the order of events must match the order which occurred with the original client program.

Finally, although we did implement many of the 120 different X11 requests, we did not complete the implementation -- there were some requests that our clients did not use, and we did not then invest the time to support these commands in our prototype implementation.

4.0 References

  1. James L. Peterson, "XSCOPE: A Debugging and Performance Tool for X11", Information Processing 89, Elsevier Science Publishers, 1989, pages 49-54.