Home | Up | Questions? |
Memory has always been an important part of the architecture of a computer system. Early systems had only limited amounts of main memory, and used small addresses for them. The PDP-11 used 16-bit addresses; the IBM 360 used 24-bit addresses. Finally in the early 1970's there was a trend to 32-bit addresses, even in microprocessors. Today, 32-bit addresses are fairly standard for serious computing architectures.
32 bits allows 4Gbytes of memory addressability. However, with the increase in memory chip capacity and the drop in the cost of memory chips, machines exist today with 256Mbytes of main memory. In a short time, it will be possible to put 4Gbytes of memory on a computer system and, shortly after than, more. This puts system designers back in the problem of designing mechanisms for addressing more that 4Gbytes of memory. While it is possible to use the same techniques that were used when addresses of only 16 bits were used on machines with 128Kbytes (base, bank and segment registers), these are at best short term solutions. The long term solution is clear: increase the number of bits in the address field.
Beyond 32 bits, one could suggest 48 bits, but given that the change from 16 bits to 32 bits lasted less than 20 years, an increase to 64 bits seems preferable. An increase to 64-bit addressing allows both very large main memories and applications which need very large memories (virtual or real).
Changing to a 64-bit memory address requires some obvious changes in the underlying computer architecture:
While some amount of re-design may be necessary, many of these changes simply require increasing the degree of replication from 32 to 64. Thus, the hardware architectural changes for moving from 32 to 64 bits do not appear to be a major problem.
Similarly, we would hope that for software the change from 32 bits to 64 bits should be a simple change of the assembler and compiler to generate code for the new architecture and a recompilation of the rest of the software system. The change in address length will, however, require some change to the basic data types.
Specifically, the size of a ``char *'' data type (pointer to character) becomes 64 bits, rather than 32 bits. In addition, we may want or need to consider changing the other basic data types. For standard 32-bit architectures, these are:
char * 32 bits long 32 bits int 32 bits short 16 bits char 8 bits float 32 bits double 64 bits
Changing to a 64-bit architecture, we know that the ``char *'' becomes 64 bits. The ``long'' data type should also be changed to 64 bits. The ``short'' and ``int'' data types require somewhat more thought. The basic relationship between ``long'', ``int'', and ``short'' is that
sizeof(short) <= sizeof(int) <= sizeof(long)
Thus, it would be possible to leave ``short''s at 16 bits and ints at 32 bits while increasing ``long''s to 64 bits. However, programs typically assume that an int is either a ``short'' or a ``long'', and the 16/32/64 proposal means that an int is neither a ``short'' nor a ``long''. The general statement about ints is that the language implementation is free to choose whatever size is ``best'' for the size of an int -- either a ``short'' or a ``long'', to fit the most common machine size. Accordingly, if a value can be ``short'', it may be an int, but if it must be more than 16 bits, it should be declared a ``long''.
An alternative proposal is to keep the int the same size as a ``long'', increasing it to 64 bits, while at the same time then increasing the size of a ``short'' to 32 bits. This leaves only the ``char'', which would be expected to stay at 8 bits. However, we propose that the size of the ``char'' be increase to 16 bits, for several reasons. First, doubling the size of the ``char'' means that the size of all basic data types is doubled in going from a 32-bit architecture to a 64-bit architecture. This keeps the relationships between these basic types the same (same number of chars per short; same number of chars per long; and so on).
In addition, increasing the size of the ``char'' to 16 bits allows for a much richer set of characters. The basic purpose of a ``char'' is to store one ``character''. For early systems, a character needed only 6 bits -- to represent 64 different characters. In the 1960's, there was a movement from 6-bit characters to 8-bit characters, resulting in the ASCII and EBCDIC character sets. Actually, of course, neither of these character sets were fully populated -- seven bits would have been sufficient.
In the past decade, however, it has become obvious that to truly represent all ``characters'' used in printed writing, more that 128 characters are needed. A full 8 bits is needed for the ISO Latin-1 character set providing representation for the Western European languages. The representation of a wider ranges of languages, such as Japanese, requires at least 16 bits. Current coding generally requires a multi-byte programming model for non-European languages. Moving to a 16-bit character would allow a simple programming model for an extended international character set.
The floating point data types are defined more by the IEEE data representation standard than by the machine architecture, and can be left alone.
Thus, it would appear that the following sizes are reasonable for the basic data types.
64/32/16/8 64/64/32/8 64/64/32/16 char * 32 bits 64 bits 64 bits 64 bits long 32 bits 64 bits 64 bits 64 bits int 32 bits 32 bits 64 bits 64 bits short 16 bits 16 bits 32 bits 32 bits char 8 bits 8 bits 8 bits 16 bits
Another advantage of the 64/64/32/16 scheme is that since 8-bit objects are no longer accessible, they need not be addressable. This would allow addresses to refer to 16-bit objects, and effectively double the maximum addressable memory.
Cray, which has a 64-bit address architecture, uses the 64/64/32/8 scheme.
With any of these changes, two sorts of software problems will occur with a change in the basic data types:
Explicit packing and unpacking can be handled with the definition and use of appropriate architectural constants: number of bits per char (CHAR_BIT), number of bits per short, number of bits per long (LONG_BIT), and so on.
Another possibility, which would tend to reinforce the production of code which works correctly, given different sizes of the basic data types (by using architectural constants, for example) would be to support different schemes, depending upon a compiler option. Any of the 64/32/16/8, 64/64/32/8, or 64/64/32/16 schemes could be used for a given program. Proper coding would allow it to work under different schemes, and hence the particular scheme could be a compiler option.
To a degree, the basic problem here is that the conventional C data types provide access only to three objects: ``char'', ``short'', and ``long''. With the move to a 64-bit architecture, we have 4 objects in the hardware: 8-bit, 16-bit, 32-bit, and 64-bit quantities. With only 3 names for 4 objects, some object must be unnameable. An alternative is obviously then to introduce another name. For example, with the 64/64/32/16 scheme, a new data type ``byte'' could be used to allow access to 8-bit objects.
It could also be suggested that this problem may reoccur when a 128-bit architecture is considered. Any current solution might consider the impact of moving from the 64-bit architecture to a 128-bit architecture.
Home | Comments? |