Moving to a 64-bit Architecture

James L. Peterson

July 1996

Memory has always been an important part of the architecture of a computer system. Early systems had only limited amounts of main memory, and used small addresses for them. The PDP-11 used 16-bit addresses; the IBM 360 used 24-bit addresses. Finally in the early 1970's there was a trend to 32-bit addresses, even in microprocessors. Today, 32-bit addresses are fairly standard for serious computing architectures.

32 bits allows 4Gbytes of memory addressability. However, with the increase in memory chip capacity and the drop in the cost of memory chips, machines exist today with 256Mbytes of main memory. In a short time, it will be possible to put 4Gbytes of memory on a computer system and, shortly after than, more. This puts system designers back in the problem of designing mechanisms for addressing more that 4Gbytes of memory. While it is possible to use the same techniques that were used when addresses of only 16 bits were used on machines with 128Kbytes (base, bank and segment registers), these are at best short term solutions. The long term solution is clear: increase the number of bits in the address field.

Beyond 32 bits, one could suggest 48 bits, but given that the change from 16 bits to 32 bits lasted less than 20 years, an increase to 64 bits seems preferable. An increase to 64-bit addressing allows both very large main memories and applications which need very large memories (virtual or real).

Changing to a 64-bit memory address requires some obvious changes in the underlying computer architecture:

  1. Hardware structures, such as memory busses, must be expanded to support the 64-bit address. For current architectures, it would be possible to use two cycles on a 32-bit bus, but eventually a 64-bit bus will be needed.

  2. Registers will need to be increased from 32 bits to 64 bits. The hardware which operates on these registers (adders, comparison units) will also need to be increased.

  3. Instruction fields may also need to be increased in size to allow for larger address fields, or for 64-bit constants.

While some amount of re-design may be necessary, many of these changes simply require increasing the degree of replication from 32 to 64. Thus, the hardware architectural changes for moving from 32 to 64 bits do not appear to be a major problem.

Similarly, we would hope that for software the change from 32 bits to 64 bits should be a simple change of the assembler and compiler to generate code for the new architecture and a recompilation of the rest of the software system. The change in address length will, however, require some change to the basic data types.

Specifically, the size of a ``char *'' data type (pointer to character) becomes 64 bits, rather than 32 bits. In addition, we may want or need to consider changing the other basic data types. For standard 32-bit architectures, these are:

	 char *      32 bits
	 long	     32 bits
	 int	     32 bits
	 short	     16 bits
	 char	      8 bits

	 float	     32 bits
	 double	     64 bits

Changing to a 64-bit architecture, we know that the ``char *'' becomes 64 bits. The ``long'' data type should also be changed to 64 bits. The ``short'' and ``int'' data types require somewhat more thought. The basic relationship between ``long'', ``int'', and ``short'' is that

    sizeof(short) <= sizeof(int) <= sizeof(long)

Thus, it would be possible to leave ``short''s at 16 bits and ints at 32 bits while increasing ``long''s to 64 bits. However, programs typically assume that an int is either a ``short'' or a ``long'', and the 16/32/64 proposal means that an int is neither a ``short'' nor a ``long''. The general statement about ints is that the language implementation is free to choose whatever size is ``best'' for the size of an int -- either a ``short'' or a ``long'', to fit the most common machine size. Accordingly, if a value can be ``short'', it may be an int, but if it must be more than 16 bits, it should be declared a ``long''.

An alternative proposal is to keep the int the same size as a ``long'', increasing it to 64 bits, while at the same time then increasing the size of a ``short'' to 32 bits. This leaves only the ``char'', which would be expected to stay at 8 bits. However, we propose that the size of the ``char'' be increase to 16 bits, for several reasons. First, doubling the size of the ``char'' means that the size of all basic data types is doubled in going from a 32-bit architecture to a 64-bit architecture. This keeps the relationships between these basic types the same (same number of chars per short; same number of chars per long; and so on).

In addition, increasing the size of the ``char'' to 16 bits allows for a much richer set of characters. The basic purpose of a ``char'' is to store one ``character''. For early systems, a character needed only 6 bits -- to represent 64 different characters. In the 1960's, there was a movement from 6-bit characters to 8-bit characters, resulting in the ASCII and EBCDIC character sets. Actually, of course, neither of these character sets were fully populated -- seven bits would have been sufficient.

In the past decade, however, it has become obvious that to truly represent all ``characters'' used in printed writing, more that 128 characters are needed. A full 8 bits is needed for the ISO Latin-1 character set providing representation for the Western European languages. The representation of a wider ranges of languages, such as Japanese, requires at least 16 bits. Current coding generally requires a multi-byte programming model for non-European languages. Moving to a 16-bit character would allow a simple programming model for an extended international character set.

The floating point data types are defined more by the IEEE data representation standard than by the machine architecture, and can be left alone.

Thus, it would appear that the following sizes are reasonable for the basic data types.

	     		    64/32/16/8	    64/64/32/8	    64/64/32/16

 char *      32 bits  	    64 bits  	    64 bits  	    64 bits
 long	     32 bits	    64 bits  	    64 bits  	    64 bits
 int	     32 bits	    32 bits	    64 bits	    64 bits
 short	     16 bits	    16 bits	    32 bits	    32 bits
 char	      8 bits	     8 bits	     8 bits	    16 bits

Another advantage of the 64/64/32/16 scheme is that since 8-bit objects are no longer accessible, they need not be addressable. This would allow addresses to refer to 16-bit objects, and effectively double the maximum addressable memory.

Cray, which has a 64-bit address architecture, uses the 64/64/32/8 scheme.

With any of these changes, two sorts of software problems will occur with a change in the basic data types:

  1. Explicit Knowledge of the size of the data types occurs in code that defines bit fields, and/or does its own packing and unpacking by shifting and masking. Troff, for example, explicitly packs and unpacks characters into ``long''s. The code that effects the pack/unpack operations ``knows'' that there are 4 characters of 8 bits each in a 32-bit ``long''. This code can probably be found by searching for the C shift and mask operators.

  2. Implicit Knowledge of the size of data types occurs when code overlays one data structure on the memory representation of another, and ``knows'' that the parts of the structure have the proper positioning and alignment for use as different data types. The ``union'' data type is sometimes misused for this purpose, but casting and implicit casting using procedure paramenters (passing a ``char *'' to a procedure that expects a ``long'') also are used.

Explicit packing and unpacking can be handled with the definition and use of appropriate architectural constants: number of bits per char (CHAR_BIT), number of bits per short, number of bits per long (LONG_BIT), and so on.

Another possibility, which would tend to reinforce the production of code which works correctly, given different sizes of the basic data types (by using architectural constants, for example) would be to support different schemes, depending upon a compiler option. Any of the 64/32/16/8, 64/64/32/8, or 64/64/32/16 schemes could be used for a given program. Proper coding would allow it to work under different schemes, and hence the particular scheme could be a compiler option.

To a degree, the basic problem here is that the conventional C data types provide access only to three objects: ``char'', ``short'', and ``long''. With the move to a 64-bit architecture, we have 4 objects in the hardware: 8-bit, 16-bit, 32-bit, and 64-bit quantities. With only 3 names for 4 objects, some object must be unnameable. An alternative is obviously then to introduce another name. For example, with the 64/64/32/16 scheme, a new data type ``byte'' could be used to allow access to 8-bit objects.

It could also be suggested that this problem may reoccur when a 128-bit architecture is considered. Any current solution might consider the impact of moving from the 64-bit architecture to a 128-bit architecture.