XSCOPE: A Debugging and Performance Tool for X11

James L. Peterson

Software Technology Program
Microelectronics and Computer Technology Corporation (MCC)
3500 West Balcones Center
Austin, Texas 78759

15 Sept 1988

This paper has been published. It should be cited as

James L. Peterson, ``XSCOPE: A Debugging and Performance Tool for X11'', Information Processing 89: Proceedings of the IFIP 89 Congress, Elsevier Science Publishers, (August 1989), pages 49-54.


The X Window System architecture consists of separate processes for the window server and its clients. When errors or poor performance is a problem in such a system, it may be difficult to determine if the problem is with the client, some client toolkit, or with the server. XSCOPE allows a programmer to view the actual request, reply, error, and event packets sent between the client and the server, providing the data needed to understand the system. The design of XSCOPE indicates some problems with the syntax of the X11 protocol.

Table of Contents

  1. Introduction
  3. Design of XSCOPE
  4. Evaluation of XSCOPE
  5. References

1.0 Introduction

One of the major advances in computer technology over the past 5 years has been the development of window systems. A window system allows a display screen to be partitioned into several areas (windows). Programs can use windows to interact with a user, displaying results and requesting input. The user controls the size and placement of windows on the screen. Window system technology is still being developed, resulting in new, often competing, window systems.

The X11 window system is the latest in a series of window systems developed at MIT under the guidance of the Athena project [Scheifler and Gettys 1986]. The X window system has roots in the W window system from Stanford, but has been redesigned many times, most recently from X10 to X11.

The X11 window system is a window server. A window server operates as a separate process controlling allocation and access to the display screen. Application programs, called clients, run as processes which communicate with the window server using interprocess communication, possibly over a network. The resulting architecture is shown in Figure 1.

Figure 1. X11 Client-Server Model.

The communication between the X11 window server and an X11 client is defined by the X11 protocol. The X11 protocol defines the form and content of requests that the client can make to X11 and the replies, errors, and events which the X11 server may return to the client. The protocol specifies the type and sequence of values in requests, replies, errors and events.

The X11 protocol is a byte-level protocol [Scheifler 1988]. Useful programming is more commonly accomplished by using a library or toolkit of higher-level functions which issue sequences of lower level X11 requests. For example, an application may interact with its user in terms of text, buttons, scroll bars, and menus. These higher-level objects are supported by a separately programmed toolkit.

Sometimes several layers of libraries and toolkits may be involved, with one toolkit built on top of another toolkit which uses a library to issue the actual X11 requests. As a result the application program may be quite distant from the X11 requests which are sent to the server. If subtle errors exist in the toolkits which support the application, they may be very difficult to detect.

Even more difficult to detect than actual errors in toolkits might be non-optimal translation from higher-level interfaces to the X11 protocol interface. For example, an early version of the Gnu Emacs editor running on X10 appeared to work well, until the X10 server was slowed down for debugging. In this situation, it became apparent that the Emacs editor was redrawing its display two to three times more often than necessary. At the higher, normal speed of the X10 server, the redisplay was present, but not visible.


XSCOPE is a tool to allow the interaction between an X11 client and an X11 server to be analyzed. XSCOPE sits between the client and the server and ``listens'' to all communication between them (Figure 2). XSCOPE is designed to print a trace of this communication allowing programmers to see the correctness and efficiency of their use of X11. It is particularly valuable to those programmers who write toolkits.

Figure 2. XSCOPE monitors the communication between X11 and a client.

An earlier version of XSCOPE was written for X10. This first version proved very useful in the implementation and debugging of the NX, MX, and WSII modules of DELI [Halasz et al 1987] which support X10. A similar program was useful in implementing the NeWS support for WSII, although in that case the communication between the client (WSII) and the server (NeWS) was simple ASCII characters [Gosling 1986]. The NeWS version was generalized into a simple program that allows TCP/IP communication to be traced in ASCII, hex, or octal. This general program was first used to show the bytes flowing between an X11 client and an X11 server, as shown in Figure 3.

Figure 3. Annotated hex trace of X11/Client communication.

 0.00: Client -->   12 bytes
42 00 00 0b 00 00 00 00 00 00 00 00
 0.04:               8 bytes <-- Server
01 00 00 0b 00 00 00 22
 0.10:             136 bytes <-- Server
00 00 00 02 00 70 00 00 00 0f ff ff 00 00 00 00 00 10 40 00 01 02 01 01 20 20
08 81 0e ff f7 70 4d 49 54 20 58 20 43 6f 6e 73 6f 72 74 69 75 6d 01 01 20 32
00 00 00 00 08 08 20 32 00 00 00 00 00 08 00 6b 00 08 00 65 00 00 00 00 00 00
00 01 00 00 00 00 06 40 05 00 01 c3 01 69 00 01 00 01 00 08 00 64 02 40 01 01
01 ff 00 01 00 00 38 3a 00 08 00 64 00 01 00 02 00 00 00 00 00 00 00 00 00 00
00 00 ff ff ff ff
 0.14: Client -->   48 bytes
37 00 00 06 00 70 00 00 00 08 00 6b 00 00 00 0c 00 00 00 01 00 00 00 00 14 00
00 06 00 08 00 6b 00 00 00 17 00 00 00 1f 00 00 00 00 05 f5 e1 00
 0.18:              32 bytes <-- Server
01 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
1e de 00 00 00 00
 0.36: Client -->   20 bytes
10 00 00 05 00 0a 00 00 58 74 48 61 73 49 6e 70 75 74 00 00
 0.38:              32 bytes <-- Server
01 ff 00 03 00 00 00 00 00 00 00 49 0e ff f7 0c 00 00 00 14 00 00 00 14 00 00
00 14 00 00 00 0a
 0.44: Client -->   24 bytes
10 00 00 06 00 0e 00 00 58 74 54 69 6d 65 72 45 78 70 69 72 65 64 00 00
 0.46:              32 bytes <-- Server
01 ff 00 04 00 00 00 00 00 00 00 4a 0e ff f7 0c 00 00 00 18 00 00 00 18 00 00
00 18 00 00 00 0a
 0.46: Client -->   28 bytes
10 00 00 07 00 13 00 00 57 4d 5f 43 4f 4e 46 49 47 55 52 45 5f 44 45 4e 49 45
44 00
 0.58:              32 bytes <-- Server
01 ff 00 05 00 00 00 00 00 00 00 4b 0e ff f7 0c 00 00 00 1c 00 00 00 1c 00 00
00 1c 00 00 00 0a
 0.58: Client -->   16 bytes
10 00 00 04 00 08 00 00 57 4d 5f 4d 4f 56 45 44
 0.64:              32 bytes <-- Server
01 ff 00 06 00 00 00 00 00 00 00 4c 0e ff f7 0c 00 00 00 10 00 00 00 10 00 00
00 10 00 00 00 0a

Figure 3 shows a portion of a trace between an X11 client and an X11 server. At time 0.00 the client sends 12 bytes of data to the server. After 0.04 seconds, the server responds with 8 bytes of data, followed 0.06 seconds later (at time 0.10) by 136 more bytes. Bytes continue to flow back and forth between the client and the server. Each buffer of information is printed, in hex, along with its size, the time it was sent, and whether the client or the server sent it.

The trace in Figure 3 was generated by our general TCP/IP trace program. XSCOPE shows this same information, but interprets the bytes to show what they mean under the X11 protocol. Figure 4 shows a trace between an X11 client and an X11 server as generated by XSCOPE. XSCOPE can vary the amount of trace output generated; Figure 4 shows the default level of tracing.

Figure 4. XSCOPE trace of X11/Client communication.

 0.00: Client -->   12 bytes
                  byte-order: MSB first
               major-version: 11
               minor-version: 0
 0.04:                                     8 bytes <-- X11 Server
 0.12:                                   136 bytes <-- X11 Server
                                        protocol-major-version: 11
                                        protocol-minor-version: 0
                                              release-number: 00000002
                                            resource-id-base: 00700000
                                            resource-id-mask: 000fffff
                                          motion-buffer-size: 00000000
                                            image-byte-order: MSB first
                                        bitmap-format-bit-order: MSB first
                                        bitmap-format-scanline-unit: 32
                                        bitmap-format-scanline-pad: 32
                                                 min-keycode: 8 (^H)
                                                 max-keycode: 129 (\201)
                                                      vendor: "MIT X Consortium"
                                              pixmap-formats: (2)
                                                       roots: (1)
 0.30: Client -->   48 bytes
         ............REQUEST: CreateGC
          graphic-context-id: GXC 00700000
                    drawable: DWB 0008006b
                  value-mask: foreground | background
                          foreground: 00000001
                          background: 00000000
         ............REQUEST: GetProperty
                      delete: False
                      window: WIN 0008006b
                    property: <RESOURCE_MANAGER>
                        type: <STRING>
                 long-offset: 00000000
 0.38:                                    32 bytes <-- X11 Server
                                         ..............REPLY: GetProperty
                                                      format: 0
                                                        type: <NONE>
                                                 bytes-after: 00000000
 0.50: Client -->   20 bytes
         ............REQUEST: InternAtom
              only-if-exists: False
                        name: "XtHasInput"
 0.52:                                    32 bytes <-- X11 Server
                                         ..............REPLY: InternAtom
                                                        atom: ATM 00000049
 0.52: Client -->   24 bytes
         ............REQUEST: InternAtom
              only-if-exists: False
                        name: "XtTimerExpired"
 0.54:                                    32 bytes <-- X11 Server
                                         ..............REPLY: InternAtom
                                                        atom: ATM 0000004a
 0.54: Client -->   28 bytes
         ............REQUEST: InternAtom
              only-if-exists: False
                        name: "WM_CONFIGURE_DENIED"
 0.54:                                    32 bytes <-- X11 Server
                                         ..............REPLY: InternAtom
                                                        atom: ATM 0000004b
 0.56: Client -->   16 bytes
         ............REQUEST: InternAtom
              only-if-exists: False
                        name: "WM_MOVED"
 0.56:                                    32 bytes <-- X11 Server
                                         ..............REPLY: InternAtom
                                                        atom: ATM 0000004c

Since the underlying TCP/IP socket connections used by X11 and XSCOPE are network connections, each process may be run on separate machines, communicating over a network or on the same machine. Let us assume that a client is to be run on machine A and an X11 server on machine B. XSCOPE can be run on either machine A, machine B, or another machine,C. When XSCOPE is started, a command line parameter indicates the name of the machine with the X11 server (and possibly the display). When the client program is started, it is given the name of the XSCOPE machine.

The client program connects to XSCOPE; XSCOPE immediately connects to the real X11 server. The X11 client thinks that XSCOPE is an X11 server and the X11 server thinks that XSCOPE is an X11 client. All bytes from the real X11 client to XSCOPE are sent on to the real X11 server, and all replies from the real X11 server to XSCOPE are sent on to the real X11 client: XSCOPE simply sits in the middle providing a trace of what the real server and client are doing.

3.0 Design of XSCOPE

Much of the high level design and code for XSCOPE was carried over directly from the previous X10 XSCOPE and the general TCP/IP scope. The main loop of XSCOPE waits for input from either a client or the server, or a new client connection. If a connection is made from a new client, XSCOPE creates a new connection to the X11 server for that client and updates its tables.

When input arrives from a client, it is immediately written through to the X11 server for this client; when input arrives from the X11 server, it is immediately written through to the client. Thus, XSCOPE is transparent to both the server and the client, and its performance impact is minimal. In both cases, once the data has been written through, it is also passed on to a procedure for further processing and printing.

The processing code breaks the incoming data into meaningful packets -- one packet for each request, reply, error, or event -- and calls another routine to interpret and print the packet. Since only complete packets can be processed by the print procedures, it is necessary to buffer incomplete packets until enough bytes arrive to provide a complete packet.

3.1 Packet Lengths

The X11 protocol is significantly better than the X10 protocol in allowing XSCOPE to determine packet lengths, but could still be improved in many ways. Each X11 request contains a field (bytes 2 and 3) which gives the length of the request in words. (A word is 4 bytes). However, since bytes 0 and 1 may contain useful data, we must first get and save the first 4 bytes before being able to determine the length of the request. If the request length had been first in the request, we would be able to determine the length of the request without needing to save any bytes.

Packets from the server are more complex. If the packet is a reply to a previous request, bytes 4-7 contain the reply length. (The designers of the X11 protocol apparently felt that requests need never be more than 65K words long since the request length is limited to two bytes, while replies may need to be much longer, requiring 4 bytes for their length). Errors and events, on the other hand, have no explicit length, but are always 32 bytes long. As a result, requests, replies, events, and errors must be processed separately, even though they are all essentially the same for tracing purposes.

3.2 Packet Format

Another difficulty is identifying the exact format of each packet. Requests are identified by a request opcode in their first byte. Similarly errors and events are identified by codes in their first or second bytes. Replies, on the other hand, are not directly identified in any way. Notice that not all requests generate replies. Of the 120 different X11 requests, only 40 generate replies. Since the communication between client and server is asynchronous, it is possible for the client to make several reply-generating requests without waiting for the replies.

To identify the reply packets (or the cause of any error packets), the client must record the sequence number of each request (request 1 is the first request after the initial setup communication). Each reply (or error) contains the sequence number of the request which caused it. To identify the type of reply, the client (and XSCOPE) must keep a table of the sequence numbers of outstanding requests and match the sequence numbers to identify the type of request, and hence the type of the reply.

Once we have identified the request, reply, event, or error code, we are ready to print the contents of the packet, according to the type of the packet and its code. Here several possibilities exist. The brute force approach would be to write, by hand, for each type of packet, a procedure which would print the contents of the packet. However, since there are 120 different requests, 40 replies, 34 events, and 17 errors, each with an average of 6 fields, this would be a substantial amount of rather dull code.

For X10, we avoided brute force coding by using table-driven code. Each request, reply, event, and error was described by an entry in a table. The table entry consisted mainly of a list of fields. Each field indicated its starting byte, length, type and name. One routine would pick up a packet and its description and could then print each field in the packet. Several problems arose when we tried to use a similar approach to X11.

3.3 Types

Our first problem was with types. With X10 there were only 4 or 5 distinct types of objects: integers, identifiers, strings, and so on. With X11, however, there were 112 different types of objects. In large part this is because the X11 protocol defines types much more specifically than with X10. A type which was simply treated as a short in X10 might be an enumerated type (with all legal values given names) or a bitmask set type (with each legal bit given a name) in X11.

Not all types for X11 are explicitly defined however. A number of types are defined implicitly, in the records that use them. These types are anonymous types. This is true even when the same type is used in several different packets, as, for example, the same COLORMASK item is defined both for the StoreNamedColor request and as a subcomponent of the COLORITEM type used in the StoreColors request. Since the type cannot be referred to by name, its definition is repeated for both uses.

Indeed, the use of anonymous types seems to have hidden the possibility of re-using the same type in several places. For example, the reply to the GetKeyboardControl request has a field which is anonymously defined as 0 for Off and 1 for On. The ChangeKeyboardControl has another anonymous type with the same definition, and a different anonymous type with 0 for Off, 1 for On, and 2 for Default. In addition, other types (most of them anonymous) define the same 0/1 values as False/True, No/Yes, None/All, Off/On, Uninstalled/Installed, Disabled/Enabled, Reset/Activate, Insert/Delete, Top/Bottom, and Synchronous/Asynchronous.

Even more types are generated by the different interpretations of special values of resource ids. For example, a window identifier may have its zero value interpreted as ``None'' (for the SetInputFocus request) or ``PointerWindow'' (for the SendEvent request). Seven of the basic resource identifiers have differing special interpretations of some values.

On the other hand, the names ``MSB first'' and ``LSB first'' are defined, in two different contexts, as either 0x42 and 0x6C (setup message from the client to the server) or 1 and 0 (setup reply from the server to the client).

Given the list of types, however, we generated a large set of procedures which given a type would print that type. For most types, it was clear what to print, but there were some problems. For example, INT8 was defined as a signed 8-bit integer, and CARD8 was an unsigned 8-bit integer. A similar type was the BYTE, which was defined simply as an 8-bit value. The important difference between a BYTE and a CARD8 or an INT8 was never given.

3.4 Applicability of Table-Driven Code

Given the list of X11 types, the procedures to print them, and the definitions of the various packets, it was hoped that a table-driven printing capability could be written. Based upon the X10 program, we expected to need to know the name, start byte, length and type of each field in a packet. We soon encountered major problems however.

The major difficulty with a table-driven decoding of the X11 protocol is the inclusion of variable-length items in the requests and replies. The difficulty with variable-length items is simply because the length is not known ahead of time and cannot be put in the decoding tables.

It is clear that variable-length items are needed. For example, the OpenFont request needs a font name. A font name is a variable-length string. This is easily solved by providing the length of the string (n) as the value of another field (bytes 8 and 9) in the OpenFont request. A table-driven decoding could include information that a field item is of variable length and indicate another field whose value defines that length.

This approach is further complicated by the fact that the ``quantum'' size of a variable-length item need not be a single byte. Thus, although the variable-length name for OpenFont is a variable length number of characters (each needing one byte), the variable length item for the reply to the ListProperties request is a list of ATOMs. Each ATOM requires four bytes. The reply to the GetMotionEvents request is a list of TIMECOORDs, each of which takes 8 bytes. So we need to allow variable-length lists; each list might have a differing size of list item.

More generally each packet may have more than one variable-length list. For example, the reply for the AllocColorCells request has two lists, one of pixels and another of masks. The length of the pixel list is given in bytes 8 and 9; the length of the mask list is given in bytes 10 and 11.

We could then describe each packet by a list of fields -- each field having a start byte, length, type and name -- and a list of variable length lists -- each list having a start byte, type of list element, and a descriptor of another packet field which gives the length of the list. The start byte of a second list would follow the end of the first list. If there were three lists, the start of the third would follow the end of the second, and so on.

A variant on this approach would handle ValueLists where the number of elements in the list is determined by the number of one bits in a bit mask in another field rather than the integer value of another field.

Unfortunately, the X11 protocol does not always define the length of its variable-length lists. For example, consider the length of the list in the QueryColors request. The length of the list is not given in the request, but must be computed from the request length. Specifically if the length of the list is n, then the request length will be 2+n. (Remember that the request length is the number of 4-byte words in the request; a request length of 2+n is 8 bytes for the fixed part of the request plus n 4-byte list entries.) Other packets, such as the FreeColors request and the reply for GetImage also require that the length of data items be calculated from the request length. Perhaps the most complex is the ChangeProperty request.

Another problem is the PolyText8 and PolyText16 requests. These requests have lists of variable-length strings. The number of strings is never given, but is determined simply by when the request length runs out of possible string headers.

As a result of the complexity of the X11 Protocol, it seems unlikely that it could be printed with a table-driven procedure.

3.5 Editing the Protocol into Code

As we were attempting to construct the tables for the X11 Protocol, however, we noticed the following. To construct the tables, we extracted a description of the requests, replies, errors, events, and types from the troff source of the Protocol document. We then transformed these text descriptions into a set of procedure calls to construct a table entry. For example, the troff source for the BOOL type was:

        0       False
        1       True
This was transformed using the Gnu Emacs editor [Stallman 1988] into:
  DefineEValue(p, 0L, "False");
  DefineEValue(p, 1L, "True");
We decided that rather than having a set of procedures to construct a set of tables and a routine to interpret the tables, we could transform each packet definition into a routine to print that packet directly. While this would create some 217 packet printing procedures, it would be no more total code than the one table-driven packet printing routine plus the code to initialize the table. In addition, this approach would allow limited hand coding to compensate for the complexity of the X11 Protocol without requiring the design, construction, and interpretation of a very complex table.

With this approach, we took the descriptions of the various packets and using the Gnu Emacs text editor, converted each of them into a routine to print that packet. Once a general transformation was designed, all the requests, replies, events, and errors could be automatically transformed directly from their descriptions in the Protocol document to routines to print them.

For example, the GetWindowAttributes request was described as:

.PN GetWindowAttributes
        1       3               opcode
        1                       unused
        2       2               request length
        4       WINDOW          window
This was automatically transformed into:
     unsigned char *packet;
  /* Request GetWindowAttributes is opcode 3 */
  PrintField(packet, 0, 1, REQUEST, REQUESTHEADER) /* GetWindowAttributes */ ;
  PrintField(packet, 2, 2, CONST2(2), "request length");
  PrintField(packet, 4, 4, WINDOW, "window");

The PrintField routine is the major printing code in XSCOPE. The PrintField routine takes a packet buffer (packet), the start byte, length, type, and name of a field in the packet. It prints the name of the field and then calls a procedure which understands how to print values of the given type. The type is either a built-in type, an enumerated type, a set type, or a record type. The type printing is table-driven. For example, the enumerated type printing procedure gets the value of the field, and then scans a list of value/name pairs to determine the name to print for this value.

4.0 Evaluation of XSCOPE

XSCOPE has already been of value in our work with the X11 server. For example, XSCOPE marks all illegal enumerated values. This immediately showed that the Berkeley backing store changes to the X11 server was returning a value of 64 for the save-unders field of the X11 server set-up reply although the protocol indicates that this field should be a BOOL with values of 0 and 1 only. In addition it has pinpointed other problems as being with the server and not with the clients.

While XSCOPE is a valuable tool in its current form, it is even more valuable as an indicator of the possibilities that it suggests. In particular, notice that the input to the X11 client comes, not from X11 directly, but indirectly through XSCOPE. This suggests that XSCOPE could transform or generate events for testing purposes. Since the output from XSCOPE could be based upon a file of pre-defined events which were written to the client at a defined rate, it would be possible to repeatedly test clients with the same input and measure their response. This would eliminate the variability resulting from using a human to generate keystrokes and mouse movement.

In the other direction, XSCOPE could store the sequence of requests generated by a client and repeat them at a later time, providing a record and playback capability which could be used to either test an X11 server or provide repeatable tutorial performances of an application.

A more esoteric possibility is the idea of replicating the commands sent to XSCOPE to several different X11 servers, causing the same display to appear on several different displays [Janssen 1988].

5.0 References

  1. [Gosling 1986] James Gosling, ``SUNDEW: A Distributed and Extensible Window System'', Conference Proceedings, 1986 Winter USENIX Technical Conference, Denver, (January 1986), pages 98-103.

  2. [Halasz et al 1987] Frank G. Halasz, Robert S. Pettengill, James L. Peterson, Tom J. Smith, and Zvi Weiss, ``The Window Server Independent Interface'', Technical Report STP-348-87, Software Technology Program, MCC, (October 1987).

  3. [Janssen 1988] William C. Janssen, Jr., ``Design and Implementation of a Virtual Interaction Surface Independent Interface'', Technical Report STP-166-88, Software Technology Program, MCC.

  4. [Scheifler and Gettys 1986] Robert W. Scheifler and Jim Gettys, ``The X Window System'', Transactions on Graphics, Volume 63, 1986.

  5. [Scheifler 1988] Robert W. Scheifler, ``X Window System Protocol Release 2: X Version 11'', Laboratory for Computer Science, Massachusetts Institute of Technology, (1988), 149 pages.

  6. [Stallman 1988] R. M. Stallman, GNU Emacs Manual, Version 18, Free Software Foundation, Boston, MA, 1988.