Chapter 7: LOADERS AND LINKERS

With the programming techniques which have been presented in Chapters 4, 5, and 6 fully understood, you have the basic information needed to program the MIX computer. In the next few chapters, we change our focus somewhat to consider not how to program the MIX machine, but rather its operation in normal use. The MIX computer, like all computers, is merely a simple electronic machine which can do no more than execute the simple instructions in its instruction set, although at very high speed. To make the computer useful, however, requires software. It is the software, the programs, which allows the computer to perform its many and varied functions.

Remember that all programs must be in the main memory of the computer, in machine language, in order for them to be executed. It is difficult to program in machine language, and so a number of programs have been written to ease the programming problem by allowing programs to be written in assembly or higher-level language, and then translated by the computer into the machine language required for execution.

We examine two of these programs in great detail: loaders and assemblers. These are considered for two reasons: first, they are widely used as tools for programming, and second, the techniques used in them are illustrative of the approaches taken in the writing of other system programs. We consider loaders here, and assemblers in Chapter 8. This order is used because loaders are much simpler than assemblers, although in typical usage the assembler is used first.

Let us briefly review the process of programming a computer. The programmer first conceives of a solution to the problem at hand and defines his solution in some English-like procedural manner. This may be in written form or strictly mental notes. Then the program is written. Let us assume the program is written in MIXAL. Now at this point the program exists in a well-defined representation. However, it cannot be executed by the computer because it is in the wrong representation. To be executed it must be in machine language in memory, not on cards in assembly language. Thus it must be translated into the correct form. A part of this translation is done by the assembler. The assembler could translate the assembly language into machine language and put it in memory for execution except for one problem: the assembler.

The assembler is a program. Like any other program, when it is being executed it resides in machine language in memory. The assembler is also a very large program. While assembling, it takes up almost all of available memory. The problem that this causes is that it leaves no place in memory into which the assembled program can be put. Hence, the assembler cannot both translate from assembly language to machine language and put the resulting program in memory.

The solution to this problem is quite simple. The assembler translates the assembly language into machine language and puts the resulting program on a secondary storage device, like a tape, drum, or disk. Then, when the assembler is finished with the assembly task, a separate program is called which does nothing but read the program into memory. This second program is called a loader. A loader reads (loads) a program from an input device into memory for execution.

There are several advantages to this scheme (in addition to the fact that it is necessary). The main advantage is that to execute a program it is no longer necessary to have an assembly language version of the program and reassemble it every time it is to be executed. Instead, it is only necessary to save the output of the assembler and execute the loader to load it into memory for execution. Thus, if for example the assembler writes the machine language out on magnetic tape, the reel of tape can be dismounted and stored for an indefinite period of time before execution. It can also be executed repetitively by simply mounting the tape and executing the loader to reload the same program. For programs which are run over and over, this can result in substantial savings in computer time. Most computers have methods of loading from punched cards or punched paper tape in addition to magnetic tape, drum, or disk.

There are several different types of loaders. The simplest is an absolute loader. By making the loader more sophisticated, more complex functions can be done by the loader, resulting in even better utilization of the computer. We consider the absolute loader first, and then based on some problems in the use of the absolute loader, discuss more sophisticated loaders.

7.1 ABSOLUTE LOADERS

An absolute loader is the simplest of loaders. Its function is simply to take the output of the assembler and load it into memory. The output of the assembler can be stored on any machine-readable form of storage, but most commonly it is stored on punched cards or magnetic tape, disk, or drum.

Absolute load format

The input to an absolute loader is generally not quite in machine language. Machine language is really defined only in terms of a program in memory. The input is in loader format, a form of representation for a program which is so close to machine language that the loader need do nothing more than store the program in the appropriate memory locations.

How does the loader know which addresses the instructions (and data) should be loaded into? The assembler is told by ORIG statements written by the programmer, and this information must be passed on to the loader by the assembler. Thus the input to the loader must consist of two pieces of information: the instruction or data to be loaded, and the address where it should be loaded.

In addition, the loader will need to know when to stop loading, so we need some kind of an end-of-file marker at the end of the loader input. When the loader stops, it should transfer control to the newly loaded program. This means that a starting address is also needed.

All of these considerations lead to the following statement of the input to an absolute loader: a sequence of pairs of addresses and the values to be loaded into those addresses, followed by a starting address. The loader loads the values into the addresses until the starting address is reached, at which point it jumps to the starting address. This defines the basic structure of the loader program, and indicates what format the loader input should be in.

One approach would be to use two words for each word to be loaded by the loader. The first word contains the address where the value is to be loaded and the second contains the value to be loaded. Thus loader input would be a sequence of these pairs. A flag is needed to signal the end. We could use the sign bit to signal the end by setting it to "+" for the address/value pairs and "-" for the starting address (which comes at the end). However, if the address were zero, it would be difficult to distinguish a positive sign from a negative sign, so instead let us put the address in bytes 0:2 and use byte 5 to signal an address/value pair (byte 5 = 1) or the starting address (byte 5 = 0). Notice that by putting the address in bytes 0:2, it may be possible to use indirect addressing to use the address.

Our load format can thus be as diagrammed in Figure 7.2. This load format certainly supplies the loader with all the information it needs to load. However, it is probably not very efficient about it. Consider the sequence of addresses where words should be loaded. Suppose the loader has just finished loading a word into location 1000. What is the most likely address for the next load? Probably 1001. And after 1001? Probably 1002. And so on. Most of the code and data to be loaded will be placed in sequential locations in memory. The only reason why this would not occur would be due to ORIG statements in the assembly language program, or the end of the program.

We can take advantage of this to reduce the size of our loader format for a given program. Instead of giving one load address, LA, and one value to be stored at location LA, we give one load address, LA, a number, N, and N values to be loaded into locations LA, LA+1, LA+2, …, LA+N-1. Each of these will be called a loader record. The load address and number of words will be the first word of the loader record, and will be followed by the values to be stored, one per word for the next N words. The starting address will still be the last value and will be indicated by an N of 0.

This revised format reduces the size of our loader image considerably. In the old loader format, we had one word of overhead for every value to be loaded. With our new format, we have only one word of overhead per loader record. If loader records average 100 words, then out of 100 words of loader format we have 99 words to load, while with the old format we have only 50 words.

The amount of memory needed to store a program in loader format is very important for two reasons. One is that the less efficient loader formats will take more space on the storage media. The other is that loading is a very I/O-bound process; virtually no computation is needed. Thus, reducing the size of the input reduces the amount which the loader must read, and hence the amount of time to load.

Error checking

All programs should be correct, and this is especially true for systems programs. Systems programs are used by many computer users, not just by the programmer who writes them. Thus errors in systems programs may result in expensive malfunctions of the computer which cost dearly in both time and money (and sometimes life). It is therefore wise to take a very conservative approach to writing systems programs.

Absolute loaders are relatively simple programs and hence most of the errors which may occur are limited to errors in the input to the loader. The input to the loader is normally produced as the output of an assembler, compiler, or other program, but even so, it may have errors. These errors must be checked for and appropriate steps taken if they are discovered.

What are appropriate steps? Almost all loader errors will result in being unable to correctly load the program. Hence in the case of loader errors, it would seem most reasonable to halt the loading process, after printing an appropriate error message.

There are two types of errors which can occur in loader input. One is the result of incorrect generation by the program which produces the loader input, and the other is the result of incorrect storage or transmission by the I/O devices involved. The errors of the first type which could occur would consist mainly of illegal load addresses or illegal values of N.The load address, LA, being an address, should obviously be greater than or equal to zero and less than 4000. Similarly, the number of values to be loaded, N, should be greater than or equal to zero, and less than or equal to 4000. In addition, since the last value loaded will be loaded at address LA+N-1, this address must be less than 4000 also. Thus we have the following requirements:

0 ≤ LA < 4000, 0 ≤ N ≤ 4000, LA+N-1 < 4000

In fact, the restrictions on addresses must be even stronger. Certainly it is the case that each address must be a legitimate machine address, but in addition, we cannot allow a load address or start address which would be an address in the loader either. The loader is a program and resides in memory. Since its code must remain correct in order for its function to be correctly carried out, we must prevent user code from being loaded into those locations which the loader occupies.

This may cause a problem, since this limits the size of user programs. The limitation is in fact quite reasonable. An absolute loader is quite small, occupying less than several hundred words. Thus most of memory is available for user programs. If all of memory is needed by a program, then it may be possible to overlay the loader with a large array, table, or I/O buffer whose initial value is unimportant. There are few programs that consist of 4000 words of solid code.

To allow the loader to stay out of the way of most user programs, it is generally written to occupy the last locations of memory. Most programmers write their programs to start at location 0. Thus placing the loader in high core means that it will seldom interfere with the user program. This effectively changes the limit 4000 in the above inequalities to 4000 -- (size of the loader).

The second kind of error which can occur is an error by the I/O devices involved in the output, storage, and input of the loader input. It is not unknown for paper tape readers, or card readers, to misread a hole as a non-hole or vice versa, or to shift the position of a hole. Magnetic tape is generally more immune to this sort of problem due to parity checks, but still failures are known to have occurred. The errors may even occur in the CPU rather than in the I/O devices or in the electronics between the two devices.

Errors caused by storage or transmission of the loader input may cause illegal addresses or counts to be used, and hence would be detected by the above checks. However, it is more likely that they will occur in the values to be stored in memory, and hence be unnoticed, until the loaded program executes the incorrect instruction. Any possible number is permissible for the values to be loaded, and so no simple check (such as a range check) can be put on them to catch I/O errors.

The method most commonly used to detect I/O errors in loader input is a checksum. The idea of a checksum is similar to the idea of parity. A checksum is computed and stored by the assembler with each block of loader input. The computation of a checksum is quite easy: the sum of the entire block is computed (ignoring overflows). This sum (the checksum) is generally stored as the last word of the loader record. When the block is read back in, the checksum is computed on the input and compared with the stored checksum. If they match, no error has occurred; if they do not match, a checksum error has occurred.

To show that a checksum helps to prevent errors, assume that any one value changes. Then the sum of the values changes and so the stored checksum will not match a checksum which is computed when the block is read back in. Thus the error is detected. For an error to be undetected, multiple changes will need to occur in such a way that the arithmetic sum is unchanged; this is very unlikely.

A checksum is strictly an error detection technique. If a checksum error occurs, it is not possible to know which word (or words) have changed, or from what. In order to correct the error, it is necessary to go back to the original assembly language source and reassemble, producing new (and hopefully correct) input values for the loader.

An absolute loader

With all of the considerations of the above sections on loader formats and error checking, we can now turn to writing the code for an absolute loader. Our general algorithm is

An absolute loader is heavily I/O-bound and so double buffering should be used. Also, the loader format is device independent and can be used with any device which can store words. Thus, for modularity we use a subroutine to properly handle all I/O. The subroutine, whenever it is called, returns the next word of the loader input. Thus, if a different I/O device is used, only this one subroutine need be changed. The subroutine will return the next word in the A register.

* * DATA FOR INPUT SUBROUTINE * TAPE EQU 0 DEVICE NUMBER ADDR ORIG *+1 ADDRESS OF NEXT WORD TO RETURN COUNT CON 0 COUNT OF NUMBER OF WORDS BUF1 ORIG *+100 BUFFERS FOR BUF2 ORIG *+100 DOUBLE BUFFERING INPUTBUF CON BUF1 BUFFER TO READ INTO USINGBUF CON BUF2 BUFFER TO USE * * * INPUT SUBROUTINE FOR MAGNETIC TAPE. * 100 WORD PHYSICAL RECORDS ARE READ FROM * TAPE AND RETURNED TO THE LOADER ONE WORD * AT A TIME, IN A REGISTER. DOUBLE BUFFERING * IS USED. NO REGISTERS, OTHER THAN A, ARE * CHANGED. * * FIRST IN BUF1(TAPE) SHOULD BE EXECUTED IN MAIN * PROGRAM TO START DOUBLE BUFFERING. * INPUT STJ EXITINPUT ST1 TEMP1(0:2) ST2 TEMP2(0:2) SAVE REGISTERS * LD1 COUNT CHECK IF WORDS LEFT IN BUFFER J1P RETURNWORD * * WORDS IN USINGBUF ALL GONE. INPUT MORE INTO * THAT BUFFER, AND SWITCH BUFFER POINTERS. * ALSO RESET COUNT AND ADDRESS FOR RETURN WORDS. * LD1 USINGBUF IN 0,1(TAPE) READ NEW BLOCK LD2 INPUTBUF ST2 USINGBUF SWITCH BUFFER POINTERS ST2 ADDR ST1 INPUTBUF BUFFER BEING INPUT TO ENT1 100 COUNTER * * RETURN THE NEXT WORD. * ADDRESS IN I2, COUNT IN I1 * RETURNWORD LD2 ADDR ADDRESS POINTER FOR NEXT WORD LDA 0,2 INC2 1 SET FOR NEXT WORD ST2 ADDR DEC1 1 AND DECREMENT COUNTER ST1 COUNT * TEMP2 ENT2 * TEMP1 ENT1 * EXITINPUT JMP *

The loader itself is now very straightforward coding. The major coding is actually error checking and messages.

LP EQU 18 * ORIG 3700 FIRST CON * * * DATA AND VARIABLES FOR LOADER * CHKSUM ORIG *+1 FOR CHECKSUM COMPUTATION HEADER ORIG *+1 HEADER FOR ERRORS * * ERROR MESSAGE PRINTING * CHKMSG ALF CHECK ALF SUM E ALF RROR HDRMSG ALF ERROR ALF IN H ALF EADER * * ERRORLINE ALF *** ERRORMSG ORIG *+3 SPACE FOR MESSAGE ALF ALF BLO ALF CK = ERRORBLK ORIG *+1 ALF ALF HEAD ALF ER = ERRORLA ORIG *+1 LOAD ADDRESS ALF (LA), ALF ERRORN ORIG *+1 NUMBER OF WORDS ALF (N) ORIG *+8 * * * AN ABSOLUTE LOADER FOR A MIX MACHINE * * LOADER INPUT IS ASSUMED TO BE ON TAPE 0 AS A * SEQUENCE. OF BLOCKS. THE FIRST WORD OF EACH * BLOCK IS A HEADER, CONTAINING THE NUMBER OF * WORDS TO BE LOADED (N), AND THEIR LOAD * ADDRESS (LA), FOLLOWED BY THE WORDS TO * BE LOADED AND A CHECKSUM. AN EMPTY BLOCK * (N = 0) MEANS THAT LA IS THE STARTING ADDRESS. * * * INITIALIZATION * LOADER IN BUF1(TAPE) START DOUBLE BUFFERING ENT3 0 BLOCK COUNTER * * START NEXT BLOCK. INPUT LOAD ADDRESS AND * NUMBER OF WORDS. INITIALIZE CHECKSUM. * INCREMENT BLOCK NUMBER STORED IN I3. * NEXTBLOCK JMP INPUT STA CHKSUM INITIAL CHECKSUM VALUE STA HEADER SAVE HEADER FOR POSSIBLE ERROR INC3 1 * * USE I1 FOR LOAD ADDRESS * AND I2 FOR NUMBER OF WORDS * LD1 HEADER(0:2) LD2 HEADER(4:5) J1N HDRERROR CHECK FOR 0 ≤ LA CMP1 FIRST AND LA < LOADER JGE HDRERROR * ENTA -1,2 N-1 INCA 0,1 LA+N-1 CMPA FIRST JGE HDRERROR * J2Z 0,1 IF N=0, I1 IS START ADDRESS * * LOAD THE NEXT N WORDS INTO LA, LA+1, ... * COMPUTE THE CHECKSUM AS WE GO. * LOADLOOP JMP INPUT GET WORD STA 0,1 LOAD INTO MEMORY INC1 1 ADD CHKSUM STA CHKSUM DEC2 1 J2P LOADLOOP CONTINUE FOR MORE WORDS * * BLOCK IS LOADED. GET CHECKSUM AND COMPARE * JMP INPUT CMPA CHKSUM JE NEXTBLOCK * * ERROR HANDLING. PRINT MESSAGE AND HALT * CHKSUMERR ENT2 CHKMSG CHECKSUM ERROR JMP ERROR PRINT AND HALT ROUTINE * HDRERROR ENT2 HDRMSG HEADER ERROR * JMP ERROR FALL THROUGH TO ERROR ROUTINE * * ERROR ROUTINE. FORMAT ERROR MESSAGE WITH * MESSAGE (ADDRESS IN I2), BLOCK NUMBER (IN I3), * LA AND N (FROM HEADER). PRINT AND HALT * ERROR ENT1 ERRORMSG MOVE 0,2(3) MESSAGES OF 3 WORDS ENTA 0,3 BLOCK NUMBER CHAR STX ERRORBLK LDA HEADER(0:2) CHAR STX ERRORLA(2:5) LDA HEADER(4:5) NUMBER N CHAR STX ERRORN(2:5) OUT ERRORLINE(LP) PRINT LINE HLT * END LOADER

Some minor points still need to be considered for the assembly of the loader. The loader should be ORIGed to the end of memory, but where is that? There are two ways to tell. One is to count the number of words used by the loader, by hand. The other is to submit the program to the assembler and then look at the output listing. Either way, we can then compute the length of the loader and the proper ORIG address.

A more important problem is testing the loader. The loader is one of the most often used programs in the system and because of this should be very carefully coded. It is wise to both examine the written code by hand for possible errors and also to carefully test the code. A special driver program should be written to produce tapes for testing the loader. These test tapes should include both legitimate loader input and all possible error cases. The performance of the loader should be checked for both correctness, speed, and usefulness of error messages.

7.2 RELOCATABLE LOADERS

Absolute loaders have a number of advantages: they are small, fast and simple. But they have a number of disadvantages, too. The major problem deals with the need to assemble an entire program all at once. Since the addresses for the program are determined at assembly time, the entire program must be assembled at one time in order for proper addresses to be assigned to the different parts. This means that a small change to one subroutine requires reassembly of the entire program. Also, standard subroutines, which might be kept in a library of useful subroutines and functions, must be physically copied and added to each program which uses them.

There is one possible alternative to constant reassembly of unchanged subroutines. Given an estimate of the lengths of the subroutines involved, we can ORIG each subroutine separately, and assemble each one individually. Then we can load the subroutines into their predefined locations by feeding the separate loader files into the loader. A change in a single subroutine requires reassembly of only that subroutine and a new loader file to be loaded with the others which need not be reassembled.

The approach can work, but only with care, foresight, and a lot of luck. The problem is in knowing how big each subroutine will be. Since the subroutines cannot be allowed to overlap, we must not estimate too little, so to be safe, we must overestimate. This results in wasted, unused space between the subroutines. Also, if a subroutine needs changing it may grow in size. This may result in it overlapping the subroutine which follows it in memory, which would require reassembly of that routine with a different ORIG statement, but this might cause that subroutine to overlap the subroutine following it, and so forth.

For library subroutines, this can be even a greater problem. At the time that the library routines are written, it is nearly impossible to decide where they should go in memory. The subroutine should be placed in memory so that it does not overlap any other subroutine, user-written or library, which may be in memory at the same time, and so that little memory is wasted between routines. For any sizable collection of subroutines, this is a nearly impossible task.

The problem here is one of binding time. Remember that one of the purposes of assembly language is to allow symbolic addresses to be used. These symbolic addresses are being bound to physical addresses at assembly time. To bind an address is simply to associate it with a physical memory location. The assembler is, using ORIG statements and its symbol table, assigning physical addresses to symbolic addresses. But this negates one of the possible benefits of assembly language, the ability of the assembly language programmer to ignore physical machine addresses. The binding of symbolic addresses to physical addresses is really important only when the program is actually loaded into memory. Thus, we would like to delay address binding until load time.

A relocatable loader is a loader which allows this delay of binding time. A relocatable loader accepts as input a sequence of segments, each in a special relocatable load format, and loads these segments into memory. The addresses into which segments are loaded are determined by the relocatable loader, not by the assembler or the programmer.

Each segment is a subroutine, function, main program, block of global data, or some similar set of memory locations which the programmer wishes to group together. Segments are loaded into memory one after the other, to use as little space as possible. The relocatable load format is defined so that separate segments can be assembled or compiled separately and combined at load time.

Relocation

The relocation implied in the name "relocatable loader" refers to the fact that on two separate loads, the same segment can be loaded into two different locations in memory. If any of the segments which are loaded into memory before a segment change in size due to recoding and reassembly between the two loads, then the addresses in memory into which the segment is loaded will change by the same amount.

BEGIN LD2 LENGTH LOOP IN BUFFER(16) OUT BUFFER(18) DEC2 1 J2P LOOP HLT LENGTH CON 10 BUFFER ORIG *+24 END BEGIN

Knowing this, how does a change in the values of the symbols affect the code generated by the assembler? The symbols which refer to addresses are used in the address field of an instruction. Thus, changing the value of a symbol, as a result of changing the base address for the program, means changing the address field of any instruction which refers to an address in the program.

For our sample program above, the addresses generated, assuming that the program starts at zero, would be

code			address generated
`BEGIN`	`LD2`	`LENGTH`	6
`LOOP`	`IN`	`BUFFER(16)`	7
	`OUT`	`BUFFER(18)`	7
	`DEC2`	`1`	1
	`J2P`	`LOOP`	1
	`HLT`		0
`LENGTH`	`CON`	`10`	10
`BUFFER`	`ORIG`	`*+24`
	`END`	`BEGIN`	0

If we were to start the program at 1000, the addresses would be

code address generated BEGIN LD2 LENGTH 1006 LOOP IN BUFFER(16) 1007 OUT BUFFER(18) 1007 DEC2 1 1 J2P LOOP 1001 HLT 0 LENGTH CON 10 10 BUFFER ORIG *+24 END BEGIN 1000

CR EQU 16 LP EQU 18 N EQU 10

IN BUFFER(CR) OUT BUFFER(LP) LENGTH CON N

The dichotomy between absolute and relocatable symbols means that when a program is loaded into memory with a base other than zero, some, but not all, of the address fields must be changed by addition of the correct base address. These addresses must be distinguished in some manner, so that the loader can correctly modify them during loading. Various techniques for this are used, and we discuss them shortly. The addresses used in the program when assembled are relocatable or relative addresses, since they are only the addresses relative to the base address.

The general flow of a relocatable loader can now be defined. A variable, LOADBASE, stores the value of the base address for the segment currently being loaded. As that segment is loaded, all relative addresses are modified by the addition of the base address. When the segment is loaded, LOADBASE is increased by the length of the segment to be ready for use as the base address of the next segment. Thus segments are loaded into memory one after another, with no overlap and no wasted space, just as desired.

Linking

At this point we might pause a moment to consider the usefulness of a relocatable loader. The objective is to allow separate subroutines and functions to be prepared and assembled separately and loaded together. The relocatable loader does this, but it provides no means for one subroutine to call another. Consider the three assembly statements for calling a subroutine SUM with two parameters

ENT1 ARRAY ENT2 10 JMP SUM

Thus there are three types of addresses in a relocatable program: absolute, relative, and external. The treatment of absolute and relative addresses has already been discussed. Let us now consider externals. An external symbol in one segment must be an entry point in some other segment. For example, the subroutine name SUM used above as an external is an entry point in a separate segment. The loader must link up the reference to the external symbol SUM with the definition of the entry point SUM in a separate segment. The linking means that the address of SUM, in its segment, must be stored in the address field of the instruction which referenced SUM as an external, in its segment. This linking function is performed by a linking loader, or simply, a linker. Since it makes no sense to relocate programs in memory unless they have the ability to reference each other (which is what linking allows), most relocatable loaders are relocatable linking loaders. We follow standard practice by referring to a relocatable linking loader as a relocatable loader.

Relocatable assembly language

The changes that are necessary for use of a relocatable loader have implications for the assembly language which produces the input for the loader. The MIXAL language which is used on the MIX computer is an absolute assembly language; it is used in connection with an assembler which produces loader input for an absolute loader. It has no provisions for generating relocatable loader code. A relocatable MIXAL would vary in several respects.

One variation would be in the ORIG statement; it would be allowed in only limited ways, if at all. A programmer would no longer be able to ORIG her code to an arbitrary absolute address. The only uses which would be permitted would be relative ORIGs, like ORIG *+10, ORIG N*2+*, and so forth. Many assemblers thus replace the ORIG statement with a BSS statement. The BSS statement stands for "Block Storage Save" and takes one expression in its operand field. A BSS 10 is identical to an ORIG 10+*, a BSS N*2 to an ORIG N*2+*, and so on.

Another change is in allowed address expressions. Each symbol is either absolute or relocatable. Expressions can also be typed as absolute or relocatable. An absolute symbol is an absolute expression, and a relocatable symbol is a relocatable expression. The sum, difference, product, and quotient of absolute expressions are absolute expressions. Relocatable expressions are more difficult to define. A relocatable expression is one whose final value depends upon the base address in the same way as a relocatable symbol. The binding from relocatable to absolute should be a simple addition of a base. Let R be a relocatable symbol or expression, and let A be an absolute symbol, constant, or expression. Then the expressions, R+A, R-A, and A+R are relocatable. An expression like R-R is absolute. Expressions like R+R, R*R, R/R, A*R, or R/A are neither relocatable nor absolute and hence are not allowed.

Notice that either R or A can be an expression, and so expressions like R-R-A+R+A-R+R are allowed (and are relocatable). To determine if an expression is either relocatable or absolute, replace all relocatable symbols by R and all absolute symbols by A. Then check that no relocatable symbols are involved in multiplications or divisions. Finally combine subexpressions such as R+A, A+R, R-A and substitute with R, and A+A, A-A, A/A, A*A, and R-R, substituting with an A until the entire expression is reduced as far as possible. If the result is either R or A, the expression is valid and of the indicated type; otherwise it is illegal.

One other change is concerned with externals and entry points. To the assembler of a segment, any reference to an external will appear to be a reference to an undefined symbol, which is not what is meant. One possible approach to this would be to treat all undefined symbols as externals, but this would result in truly undefined symbols being treated incorrectly. Thus a new pseudo-instruction is introduced which is used to notify the assembler that a symbol is meant to be an external symbol, not an undefined one. This new pseudo-instruction, EXT, could be of the form

EXT <list of external symbols>

Corresponding to the externals are entry points and they are treated in much the same way. In order for the loader to correctly link externals with the corresponding entry point, the loader must know where the entry points are. Thus assembly languages often include an entry point declaration pseudo-instruction ENT. The form of this pseudo instruction is often

ENT <list of entry point symbols>

In addition to these programmer-visible changes in assembly language programs, the assembler itself must also produce relocatable, not absolute, loader code.

Relocatable load format

Many different formats can be used for relocatable loader input and each of the formats may have its own advantages and disadvantages. The basic information which must be provided by the assembler to the loader is,

As a simple approach to the problem, we can use a loader format very similar to the absolute loader format of Figure 7.2 above. All input to the relocatable loader will consist of a sequence of codes which identify what information follows it. The codes will be stored in byte 3 of a header word, and are,

Code		Meaning
0		Start of a segment
1		Absolute address (no relocation needed)
2		Relocatable address (needs `BASE` added)
3		Entry point symbol
4		External symbol
5		Starting address
6		End of segment
7		End of loader input.

The remainder of the header word will contain various information depending upon the code. Specifically,

Code	Meaning
0	Start of segment. A segment name of up to 10 characters follows in the next two words.
1	Absolute. Bytes 0:2 of the header contain a relative address. The following word is an absolute value which should be loaded into memory at the relative address.
2	Relocatable. Bytes 0:2 of the header contain a relative address. The following word should be loaded into that location and the address field of the loaded value should be modified by the addition of the base address of the current segment.
3	Entry point. The next two words contain the 10-character symbolic name of an entry point. Its relative address is in bytes 0:2 of the header word.
4	External. The next two words contain the 10-character symbolic name of an external. The instruction (which must already have been loaded) which is at the relative address given in bytes 0:2 of the header uses this external.
5	Starting address. Bytes 0:2 contain the relative address in the current segment where execution should begin.
6	End of segment. Bytes 0:2 contain the length of the segment.
7	End of loader input. Control can now be transferred to the starting address.

As before, with our absolute loader input, we should consider the efficiency of this representation. Each instruction to be loaded here takes two words, at the least, plus additional words for segment definitions, entry points, and externals. This needs to be reduced if possible. There will probably be few entry points, externals, starting addresses or segments, so we can ignore those codes for now and concentrate on the representation of codes 1 and 2; these will be the bulk of our loader input.

With the input to the absolute loader, it was reasonable to assume that there would be a large number of contiguous locations to be loaded, and that one header word could serve a block of contiguous words as well as one word. For relocatable input, that is not strictly true, however. While it is reasonable to assume that blocks of contiguous words will need to be loaded, it is not reasonable to assume that blocks of contiguous absolute words or blocks of contiguous relocatable words will be loaded. Rather, it is more likely that relocatable and absolute words will be intermixed.

With the exception of adding the base address to the address field of a relocatable instruction, however, relocatable and absolute instructions can be treated the same. Thus, rather than an entire header word, only one extra bit per instruction is needed to distinguish between relocatable and absolute instructions. If that bit were supplied separately, then the loader could accept blocks of input instructions, loading them as if they were absolute, and then later returning to convert the address fields of some instructions from relative to absolute by the addition of the base address. This is the approach we take here. To reduce the size of the loader input, and consequently the amount of I/O time needed to load it, we allow each header word to prefix a block of words. The number of words is given in bytes 4:5 of the header word. Thus, we redefine the code 1 input as

Code	Meaning
1	Absolute. Bytes 0:2 of the header contain a relative address; bytes 4:5 contain a number of words (`N`). Load the `N` following words into memory at the relative address given in 0:2. Some of these words may be later modified as external or relocatable references.

Code 2 input must be redefined in light of this approach. What is needed is simply the (relative) addresses of all instructions which need relocation. One approach would be to simply list the addresses of the relocatable instructions, one after another, with bytes 4:5 specifying the number of such addresses. The addresses could also be packed two to a word, to reduce the amount of space needed.

A slightly different approach is more commonly used on binary computers. From examinations of typical programs, you notice that most instructions use an address and most of these are relocatable. Thus, many instructions are relocatable and contiguous to other relocatable instructions. Hence a list of addresses of relocatable instructions would be a list of addresses which could typically differ only by one or two locations. Thus rather than an address for each, we need only the address of the first (which can be stored in bytes 0:2 of the header) and an indicator, for each of the following words, whether it is or is not relocatable. This requires only one bit per word, so that the need for relocation of 30 words can be packed into one 5-byte word on a binary MIX machine. Consider the binary word

b₂₉ b₂₈ b₂₇ … b₁b₀

1H JAE 2F LOW ORDER BIT IS ZERO IF A EVEN LD3 0,1(0:2) ADDRESS FIELD FOR INC3 0,6 RELOCATIONS ST3 0,1(0:2) 2H SRB 1 SHIFT A RIGHT ONE BIT INC1 1 INCREMENT ADDRESS JANZ 1B A NONZERO MEANS STILL 1 BIT

Whichever representation is used for the code 2 information on relocation addresses, it should be clear how to write the code for relocating addresses in a program to be loaded. If we assume the list of addresses approach, then the loader code for the subroutine

NAME SUM10 ENT SUM10 ENTRY POINT DEFINITION EXT ARRAY EXTERNAL TEM1 BSS 1 * SUM10 STJ EXITSUM ST1 TEM1 ENT1 10 ENTA 0 LOOP ADD ARRAY-1,1 DEC1 1 J1P LOOP LD1 TEM1 EXITSUM JMP * END

+0000 00 0002 CODE 0: START OF SEGMENT +2630163736 NAME OF SEGMENT: SUM10 +0000000000 +0001 01 0011 CODE 1: 9 ABSOLUTE WORDS +0011 00 0240 TO BE LOADED AT RELATIVE +0001 +0000 00 0231 +0012 00 0261 +0000 00 0260 -0001 01 0501 +0001 00 0161 +0005 00 0251 +0000 00 0211 +0011 00 0047 +0000 02 0005 CODE 2: LIST OF 5 ADDRESSES +0000 00 0001 TO BE RELOCATED +0000 00 0002 +0000 00 0007 +0000 00 0010 +0000 00 0011 +0001 03 0002 CODE 3: ENTRY POINT AT RELATIVE +0001 +2630163736 NAME OF ENTRY POINT SUM10 +0000000000 +0005 04 0002 CODE 4: EXTERNAL USED AT RELATIVE +0005 +0123230134 NAME OF EXTERNAL ARRAY +0000000000 +0012 06 0000 CODE 6: END OF SEGMENT OF LENGTH 10 WORDS

Techniques for linking

The programming techniques which are used for linking entry points to externals are similar to techniques used in other system programs, particularly assemblers and compilers. Both entry points and externals are defined by their symbolic names. The problem is to match the definitions of these symbols, as entry points, to their uses, as externals.

This matching is performed by the use of a data structure called a symbol table, or in this case, an entry point table. This table is a list of the known entry points, both their symbolic name and their address in memory. This table is defined by the collection of all code 3 entry point definitions in the input. As each entry point definition is found, the name of the new entry point and its absolute address are entered into the entry point table. When a reference to an external (code 4) is found, the entry point table is searched for the correspondingly named entry point, and the address of that entry point is then used to modify the address field of the instruction which used that external/entry point.

This solution to the linking problem works very well, except for one problem: forward references. If an external reference to an entry point is encountered before the definition of that entry point is encountered, then the search of the entry point table will not find an address for the external. Since an entry point can be declared after it is used as an external, it cannot be known for certain if an external reference which finds no corresponding entry point is a reference to an undefined entry point (an error) or simply a reference to an entry point which is yet to be defined (a forward reference). This can be known only when the end of loader input (code 7) is encountered. At that point, any undefined externals are errors.

As always, there are many techniques for solving the problem of forward references. The simplest is to use a two-pass loader. After inputting all of the loader input once, all entry points are defined. On the first pass through the loader input, we can ignore all loader input except the entry point definitions and the length of the segments (so we can convert the relative addresses of the entry point definitions to absolute addresses). When the end of loader input is encountered, the loader input is read a second time (pass two). This time the segments are loaded into memory, and all external references are resolved. Forward references will not be a problem, since all entry points were defined and entered into the symbol table during pass one. Any forward reference during pass two is a reference to an undefined external, and is an error.

The mechanics of a two-pass loader depend upon the I/O devices involved. If the loader input is on a mass storage device (magnetic tape, disk, or drum), then the loader can reread the input by rewinding or repositioning the input to the beginning for the second pass. If input is on cards or paper tape, it is often copied to a mass storage device during pass one, and then read back from this device during pass 2. If no mass storage is available, the loader input must simply be read into the computer twice.

A two-pass solution to forward references is conceptually simple, and hence easy to write. However, it requires twice as much I/O as might be desired, and since loading is an I/O-bound process anyway, will produce a slow loader. This is quite unfortunate, since the loader is so frequently used. Thus, other approaches have been tried to reduce the amount of I/O which is necessary.

One variant is to notice that all of the loading functions (storing in memory, relocation, entry point table definition) can be accomplished in one pass, except external reference resolution. Even this can be done except for forward references. Thus, instead of inputting the entire loader input twice, on pass one we do all possible loading. If forward external references are encountered, then the (absolute) address of the reference and the name of the referenced external are written to mass storage. Pass two consists of nothing more than rewinding this device and rereading it. The amount of I/O during pass two is reduced considerably.

A more radical approach is to eliminate the second pass entirely, rather than simply trying to reduce it in size. In this case, rather than storing the external references on a mass storage device, they are stored in memory. Two tables are constructed during pass one, an entry point table, and an external reference table (often call a use table). The external reference table stores, for each (forward) external reference, the name of the external and the (absolute) address where it is used. Pass two now no longer need do any I/O, and need not even be a separate pass. During pass one, if a forward external reference is found, then it is entered into the external reference table. When an entry point is defined, the external reference table is searched, and any references to this newly defined entry point are resolved (i.e., the address of the entry point is used to modify the address field of the referencing instruction). The entry in the external reference table is then deleted. When pass one is completed, the external reference table should be empty. Any remaining entries are references to externals which were never defined as entry points.

The problem is now reduced to the construction and maintenance of the two tables, the entry point table and the external reference table. The entry point table is an easier table to construct and maintain since deletions are never made from it. Thus only search and enter subroutines are needed. The external reference table is much more difficult. The table must be able to add entries, search for entries, and delete entries. This requires care that deleted entries are not used after they are deleted. Also, the memory space they occupied should be reused if possible to reduce the total amount of memory needed. This requires very careful design of an appropriate data structure.

Some loaders have merged the entry point table and the external reference table into one table. Each entry in this table consists of several fields. One field is a name, another is a bit. If the bit is 0, then the name is the name of an external reference which has not yet been defined as an entry point (a forward external reference). If the bit is 1, the name is the name of an entry point. For an entry point, we need an absolute address for that entry point. For an external reference, we need a list of all the addresses which reference that external. The problem is that the length of this list is variable and unbounded. How much space should be allocated for the list?

Since the list is of variable length, the standard approach is to use a linked list. In the table entry itself, we do not store the list, but only a pointer to the list. This means that each table entry is a name, an address, and a bit. The bit distinguishes between forward external references and entry point table entries. For forward external references, the word is the external name and the address is the address of a list of addresses which reference the external; for entry points, the word is the entry point name and the address is the address of the entry point in memory.

Where should the list be stored? One approach is to store the list in a large array of available memory. Another is to restrict the assembly language, so that external references cannot be used in expressions, but only by themselves. Thus LDA EXTERNAL is acceptable, but LDA EXTERNAL+1 is not. In this case, resolving an external reference consists in simply storing the address of the entry point in the address field of the instruction. That means that the address field has no necessary information and hence can be used as temporary storage until the external reference is resolved. This allows the list of external references to be stored in the address fields of the instructions which reference the externals. This technique is known as chaining.

In chaining, the first reference to a forwardly referenced external stores in its address field a special value (like 4095 or -1 or any other value which cannot be interpreted as a legal address). The entry point/external table entry points to this by saving the address of this instruction. When the loader finds the next reference, it sets the address field of the second reference to point to the first reference, and the address of the second reference is stored in the table. The third reference would point to the second reference, and so on. Thus, when the entry point is finally defined, the table entry points to the last reference, which points to the previous reference, …, which points to the first reference, which has a special easily recognizable pointer. The loader must follow down this chain of addresses, replacing them all with the address of the entry point.

Another technique is to set all external reference addresses to refer indirectly through the entry point table entry. The address of the entry point table entry can be defined as soon as the name of the symbol is known. Since at execution time the addresses of all entry points will have been put into the entry point table, every access to an external can simply be made indirectly through the entry point table. In this case the entry point table is an example of a transfer vector.

All of these linking techniques have their advantages and disadvantages. A two-pass loader is relatively simple, but takes more time than a one-pass loader. The transfer vector approach is straightforward, but requires additional memory at execution time for the transfer vector and additional time due to the indirect references. Indexing or indirection on externals may not occur correctly with the transfer vector approach. Chaining can be done in one pass without additional memory, but the coding for resolving the chain must be carefully written to avoid errors, and the use of external references must be restricted. The use of an external reference table may require considerable memory for the table, but allows the loader to produce a cross reference table for entry points and externals, showing where every entry point is and the addresses of all of its uses.

An entry point cross reference listing generally accompanies a memory map. A memory map lists the names, lengths, and load address of all segments and entry points. A memory map and cross reference listing can be very valuable aids in debugging a program.

Relocatable load errors

As with the absolute loader, a relocatable loader must check carefully for possible errors in loading. The errors which an absolute loader checked for were relatively obvious, while the errors of a relocatable loader are more complex. Certainly the relocatable loader must insure that all absolute addresses are legal and do not overlap the relocatable loader. To assure this, the relocatable loader is again generally placed in high memory, while user programs are loaded into low memory. The tables for the loader are of variable size and so generally start just below the loader and grow towards the user's programs. Thus a large program with few symbols or a small program with many symbols will have no problem being loaded.

In addition, some other errors which may be detected by a relocatable loader include,

Other strange conditions may or may not be errors depending upon your philosophy. Should one segment be allowed to declare the same symbol both external and an entry point?

7.3 VARIATIONS OF LOADERS

Linkage editors

A variation on the idea of a relocatable linking loader is a linkage editor. A linkage editor can be thought of as a relocatable linking non-loading loader. Its purpose is to input a number of segments in a relocatable load format, and to output a new segment which is the combination of the input segments. In so doing, some relocation is done, and some linkage.

As an example, consider two segments, MAIN and ZERO. MAIN is 43 words long, has one entry point, START, and references the externals ONE and TWO. ZERO is 15 words long, has two entry points, TWO and ZERO, and references the externals START and STOP. A linkage editor which receives as input MAIN and ZERO would produce as output a new segment which would consist of both segments MAIN and ZERO, so it would be 58 words long. In addition, all addresses in ZERO would be relocated by 43 words, so that they would be correct for following MAIN. Any externals in one segment which are entry points in the other would be resolved and would no longer appear as externals, while all the entry points of both segments would appear as entry points of the new segment. Thus, the new segment would have three entry points, START, TWO and ZERO. It would reference two externals, ONE and STOP. The references to TWO in MAIN would be linked to the entry point in ZERO, the references to START in ZERO would be linked to the entry point in MAIN. These references would now appear like any other relocatable memory references.

The reasons for wanting a linkage editor vary. From a user point of view, the repetitive linking of many routines can be quite expensive. If these routines are not changing, then they could be linked together, once and for all, by the linkage editor, while the remainder of the program is modified, reassembled, relinked, and debugged as needed. Thus a linkage editor may save valuable time.

From a system point of view the use of a linkage editor can separate the linking function from the loading function. That is, a user may need to run his entire program first through the linkage editor, until no externals remain to be linked. Then, although the program remains in relocatable format, the relocation for it for loading is zero. Thus it can be loaded by what is essentially an absolute loader. This means the loader can be considerably smaller and faster than a relocating linking loader, since it need do no linking (and generally no relocation).

A linkage editor is normally a two-pass linker. Since the program is not being loaded into memory, we cannot use chaining or an in core fix-up for externals. We must make one pass first to define the entry point table and then another pass to link and relocate the segments together.

Overlays

Sometimes a program is too large to fit in memory. There is only a fixed amount of memory on a MIX machine and some problems can only be solved by programs which require more than 4000 words of memory. However, often it is the case that the program can easily be partitioned into separate sections which are never needed at the same time. In this case, if a mass storage device is available, the program may still be able to be written and executed.

The idea of overlaying is to store the entire program only on the mass storage device. In memory is kept only the common (global) data that the sections use and any heavily used subroutines. At any given time, only one of the sections will be in main memory. When control needs to transfer from one section to another, a jump is made to a small special subroutine called an overlay monitor. The overlay monitor loads the next section from the mass storage device into memory in the same memory locations as the previous section. The sections are called overlays.

On the MIX machine, for example, assume that the common data and overlay monitor took 2000 words of memory and each of three overlays took 2000 words of memory. Although only 4000 words of memory are available, an overlay system could run this program, although it would take 8000 words to run without overlays. (What we gain in space, we lose in time, since bringing in overlays requires considerable input/output time.)

Loaders are involved with overlays in two ways. First, the overlay monitor is obviously a loader of sorts, prepared to load any of several overlay sections as directed. Second, a linkage-editor-type loader is generally used to link and relocate the program into the form needed for the overlay monitor on the direct access device.

Bootstrap loaders

Suppose the computer is executing your program. You now know how that program got into memory. It was loaded by the relocatable linking loader. You also know (or can guess) that the relocatable loader was loaded into memory by an absolute loader. How did the absolute loader get into memory? It was obviously loaded, but by what?

On most machines someone has written a small special purpose loader program which is meant only for loading the absolute loader. This program is used to load the absolute loader. But, you continue, how did this program get into memory? Answer: it loaded itself into memory (with a little help from its friends).

The bottom line in loaders is a special program called a bootstrap loader. A bootstrap loader loads itself into memory. Obviously, however, it cannot do this all by itself. The first few instructions of a bootstrap loader are I/O instructions which read into memory from a specific I/O device. The I/O is done into the memory locations which directly follow these few instructions. The information read in by these few instructions is the remainder of the bootstrap loader. For example, if the bootstrap loader is written on a magnetic tape on tape drive 1, then the following instructions are enough to get the bootstrap loader started.

IN *+2(1) JBUS *(1)

These first few instructions were loaded into the computer by hand, the computer operator, when a computer is first turned on, must enter these instructions, by hand, in binary, through the console of the computer. When these instructions are executed, they read in the remainder of the bootstrap loader, which then loads the absolute loader, which then loads the relocatable loader, which then loads your program.

Many computers now have special options which provide a bootstrap loader in special memory which can be read but not written (read-only memory), so that the loader cannot be clobbered, intentionally or unintentionally. This relieves the operator of needing to enter the bootstrap loader by hand.

Subroutine libraries

On most computer systems there exists a set of subroutines which have been written and are used by many programmers. Subroutines for common mathematical functions and for input/output are typical. This set of subroutines is often grouped into a library. A loader uses this library to try to satisfy any unsatisfied externals when the end of a relocatable load results in undefined externals.

A programmer can use any of these subroutines simply by declaring them to be external. When loading of user input is complete, the loader checks for any unsatisfied external references. If there are any, each reference causes the loader to search the library of subroutines, looking for an entry point for any of these subroutines which matches the unsatisfied external. If none is found, we have an error as before. If one is found, however, it is then loaded into memory and the external reference resolved.

Since blind searching of the entire library may be quite time consuming, the library is generally structured so that the searching can be more efficient. The names of all entry points and the name and location of their segments are stored in a directory which is at the front of the library. It is possible to then search for an entry point and load it very quickly.

7.4 SUMMARY

As you can tell, there are a number of different approaches to loading a program into memory. Typically, the standard loader is either an absolute loader or a relocatable loader. The absolute loader is very simple and may be all that is needed or can be afforded for small computers. A relocatable loader is more complicated but allows programs to be written as separate subroutines which may be assembled or compiled by different translators at different times.

The relocatable loader must combine the many different segments, relocating addresses and linking external references and entry points. There are many different possible input formats and the format produced by the assembler must be appropriate for the loader to be used. A relocatable loader may use many techniques for linking as either a two-pass or a one-pass loader. If the loader is one-pass, it may link through a transfer vector, chaining, or an external reference table.

A linkage editor links many small segments into one large segment, with the same output format as input format. Thus the output of a linkage editor may be input into a relocatable loader or back into itself for combination with other segments.

Several books include a chapter on loaders and linkers, including Donovan (1972), Graham (1975), and Ullman (1976). Barren (1969) gives a brief treatment of loaders, in general, while Presser and White (1972) concentrate on the loader and linkage editor for the IBM 360/370 systems. Knuth(1968) gives a bootstrap loader for MIX in Volume 1.

`0000`	`ABS`	`+0000000000`
`0003`	`REL`	`+0000000510`
`0004`	`REL`	`+0001000530`
`0005`	`ABS`	`+0001000260`
`0006`	`REL`	`+0002003430`

LOADERS AND LINKERS