ASSEMBLY LANGUAGE PROGRAMMING TECHNIQUES

In some sense, the programming process is independent of the language which is used for programming, but in a more realistic sense different languages encourage different programming techniques to be used. The machine which is being used for assembly language programming, since it affects the instruction set, also affects the type of programming which can be easily done. The result of this is that there is a body of general programming techniques which can be used for programming in assembly language for most machines, but the details must depend upon the particular machine being used. The basic concepts and techniques are the same for most computers. In this chapter we present some of these techniques to illustrate how to program in assembly language. By and large, many of these techniques will only be learned by their use, however, and as you program you will learn new techniques which are not presented here.

Writing a program is an art, and no one way to program is correct for ail situations. Recent discussions have lead to general agreement on several points, however. First, an important distinction exists between programming and coding. Programming is the process of converting a vague description of a problem into a specific algorithm for solving the problem. This involves defining appropriate data structures, determining what information is to be input and output and the flow of control within the program.

Defining the flow of control can be helped by the use of top-down programming techniques. A top-down approach starts with a general statement of the problem and successively refines its components by considering each of these to be a separate problem. This is somewhat difficult to explain, but easier to illustrate. Look back at how the program to add two numbers together and print their sum, in Section 2.2, was developed. First the top-most level was defined: Input, Add, Output, Halt. Then each step at this level was examined and broken down into its components until each step of the program could be converted directly into machine language. Typically, the top-down refinement process continues until each step can be solved by a single machine language instruction, subroutine (see Chapter 6), or macro (see Chapter 9).

The translation of each step of a program into a machine readable format and machine instruction is coding. Coding is a necessary but not generally exciting activity.

The process of coding is itself composed of several steps. The data locations which are to be used must be decided upon and allocated. Then the instructions must be written to manipulate this data and perform the function for which the program is being written. In practice these two steps are often intermixed, with new variables being found to be needed as the result of newly written code. The data and the code must be well separated in the final program, however. A common mistake made by novice programmers is to allocate storage for variables as they find they are needed, as in

LDA B COMPUTE B + C ADD C STA D STORE IN D D CON 0 ALLOCATE STORAGE FOR D LDA E SUB D NOW SUBTRACT (B+C) FROM E
What is the problem with this? In the assembly of this code, no problem develops; six words are assembled, the five instructions and the data value for D. But during the execution of the program, the program counter will be incremented by one after each instruction is executed in order to fetch the next instruction. This will result in the LDA being fetched and executed, then the ADD, then the STA and then the next location (containing the data value for D) will be fetched, decoded and executed as an instruction. The result of this is completely dependent upon the number stored in that location by the previous STA D instruction. For example, if B = 24 and C = 15, then their sum will be 39 and this will be stored in D as
+ 00 00 00 00 47
which is a JMP to +0000!

To prevent this kind of error, programmers use several methods. One method is to program with two sheets of paper, one for the instructions and a separate one for the variables and other data items. Then if new data items are discovered to be needed while writing instructions, the declarations for them are written on the appropriate sheet with other data items. Another approach is to first sit down and write all the code, ignoring the data declarations, and then after the instructions have been written, go back and write down data declarations and allocations for all variables which are used.

In either case, one arrives at the end with two lists, the instructions and the data. These are then combined to form the program. They can be combined in either of two ways: code and then data, or data and then code. In many instances the order is not important, but it is generally best, in MIXAL, to put your data before your code. This avoids many forward references and possible limitations on the use of symbols due to this. (Remember that forward references cannot be used in ORIG, EQU, or CON pseudo-instructions, in literals or in address expressions). Also the wise use of mnemonic variable names and comments can explain the use of the variables, preparing the reader of the program for the code which follows.

We do not generally present entire programs here, but only small fragments of code. This is because we wish to focus our attention on only specific aspects of the coding problem. This allows us to present coding techniques specifically applicable to that aspect of programming under study only. This may sometimes result in undefined symbols being used. It is assumed that the reader will be able to understand from their use how they should be defined. Most often it will be a simple variable which can be defined by a CON 0 or an ORIG *+1.

4.1 ARITHMETIC

Computers were originally built to automate arithmetic calculations, and many people still believe this is their major function. Thus the coding of arithmetic expressions is a logical place to start. (In fact, actual arithmetic computation is only a minor part of what computers do. Remember that in the machine language program of Section 2.2, only 1 instruction out of 16 was an arithmetic computation; the others were input/output, conversion, loads and stores, etc). Here we assume we have a set of integer numbers upon which we wish to compute some simple arithmetic expression.

To start, consider a simple addition. We wish to compute the sum of the value of the variables CAT and DOG and store the sum in the variable FIGHT. This code might look like

CAT CON 0 DATA LOCATION FOR CAT DOG CON 0 FIGHT CON 0 ... <code to give CAT and DOG values> ... LDA CAT ADD DOG STA FIGHT SET FIGHT = CAT + DOG

If we wanted FIGHT = CAT - DOG, we have (assuming the data declarations above)

LDA CAT SUB DOG STA FIGHT SET FIGHT = CAT - DOG

For ZOO = DOG * CAT, the code needs to consider that the product of two 5-byte numbers will be a 10-byte number with the upper 5 bytes in the A register and the lower 5 bytes in the X register. If we expect that the product will always be no more than 5 bytes in magnitude, we can write

LDA DOG MUL CAT PUTS PRODUCT IN AX STX ZOO ZOO = DOG*CAT LOWER 5 BYTES
Notice that this code fragment does not worry about overflow. If we fear that the product DOG*CAT may be too large for five bytes, we can write
LDA CAT MUL DOG AX = DOG * CAT JANZ TOOLARGE CHECK FOR PRODUCT TOO BIG STX ZOO

For this code, if the product is greater than five bytes, the A register will have the upper five bytes, which will be non-zero. In this case the program will jump to the location whose label is TOOLARGE and execute the code there. If the A register is zero, the product is small enough to fit in five bytes (and is in the X register), so this is stored in the memory location ZOO. Remember, we still have to write the code for TOOLARGE. This code might be a simple

TOOLARGE HLT 0 OVERFLOW, STOP
or more complicated (and better) code which prints a message saying what went wrong and informs the user of the program what steps can be taken to correct the problem (change data, rewrite program, or such).

Division must also be coded carefully. If we want RATIO = NMEN / NWOMEN, we need the dividend (NMEN in this case) as a 10-byte quantity in register AX for the DIV operator. Normally this means putting the dividend in the X register and zero in the A register. This can be done in several ways. The major constraint in how it is done is that the sign of the dividend is the sign of the A register. Thus, if you know that your dividend is positive, division can be

LDX NMEN ENTA 0 DIV NWOMEN COMPUTE NMEN/NWOMEN
If the sign of the dividend can be either positive or negative, then code can be
LDA NMEN(0:0) PUT SIGN IN A LDX NMEN LOAD NUMBER IN X DIV NWOMEN DIVIDE NMEN BY NWOMEN
which loads the sign into the A register (setting the rest to zero) and the rest of the dividend into the X register, or the code can be,
LDA NMEN LOAD NUMBER IN A SRAX 5 SHIFT INTO X, LEAVING SIGN DIV NWOMEN DIVIDE

In this case the number is loaded into the A register and shifted into X. SRAX is an end-off shift, so zeros are shifted into the A register and the old contents of the X register are lost. The sign bits do not participate in the shift, so the sign of the dividend remains in the sign bit of the A register.

Once the division has been done, the quotient will be in the A register, and the remainder in the X register. Thus, these two quantities can be used, as desired, and the same code can be used to compute either the quotient or the remainder (or both). Remember that the DIV operator is an integer division. Fractions would require the use of floating point arithmetic.

Once these basic operations are understood, it is possible to compute much more complicated expressions, like XI = I + J*L1. This expression would be programmed to first perform the multiplication, then the addition. To do the multiplication, we would

LDA J MUL L1 AX = L1*J
Now we have J*L1, but it is in the X register. The ADD operator only adds to the A register, so we must get our result into the A register for the addition. This could be done by
STX TEMP LDA TEMP TRANSFER FROM X TO A
where TEMP is a temporary variable, or we could simply
SLAX 5 SHIFT X INTO A
which is faster (2 time units instead of 4) and shorter (1 instruction instead of 2). Notice that we are taking advantage of the sign bits of the A and X registers as set by the MUL operator. Now we can add and store. Our complete code is then
LDA J MUL L1 COMPUTE J*L1 SLAX 5 MOVE X TO A ADD I STA X1 X1 = I + J*L1

Even more complicated expressions may be computed. Consider the expression

((B+W)*Y) + 2 + ((L-M)*(-K))/Z

This is the sum of three terms. This means that at least one term must be stored in memory, in a temporary, while the other term is being computed. The addition by 2 can be done by either

ADD =2= INCREASE A BY 2
or
INCA 2 INCREASE A BY 2.

Since the latter is shorter and faster, we use it. (Shorter because although both instructions are the same length, the ADD also requires a literal, which is one memory location).

Since the first term is simpler, let us attack it first. We can write

LDA B ADD W A = B+W MUL Y AX = (B+W)*Y
Now we can add the 2, but remember the result of the MUL will be in AX, so we
INCX 2 X = (B+W)*Y + 2

This is even better than we had thought, over the ADD approach, since it prevents having to move the product from the X register to the A register for the ADD. Now we need to store this while we compute the next term.

STX TEMP SAVE PARTIAL TERM
Next we compute ((L-M)*(-K))/Z. This can be done easiest by noting that ((L-M)*(-K)) = ((M-L) * K), which removes one operation, and we can now write,
LDA M SUB L M-L MUL K (M-L) * K DIV Z ((M-L)*K) / Z.

Notice that the MUL leaves the AX register just right for a DIV. This is one reason for doing our multiply first. Another reason is due to the integer nature of the division. If (M-L) = 100, K = 50, and Z = 100, consider the result of computing (M-L)*(K/Z) instead of ((M-L)*K)/Z. For (M-L)*(K/Z), K/Z = 0 and the product is 0, while for ((M-L)*K)/Z, the product is 5000 and the quotient is 50. This illustrates the wide difference which can result from carelessness with the order of multiplication and division.

Once the division is done, we can add from our temporary, TEMP, to the quotient, which is conveniently in the A register. Our complete code is then

LDA B B ADD W B+W MUL Y (B+W)*Y INCX 2 ((B+W)*Y)+2 STX TEMP TEMP = ((B+W) * Y) + 2 LDA M M SUB L M-L MUL K (M-L) * K DIV Z ((M-L)*K) / Z ADD TEMP ((B+W)*Y) + 2 + ((M-L)* K)/Z

Overflow is a problem which we have not considered in this code. Notice that at any (arithmetic) instruction overflow could occur (except for the DIV, but if Z = 0, this will act the same as overflow). One of the nice features of MIX is that the overflow toggle is turned on whenever overflow occurs, but is not turned off when it does not occur. This means that if it is off before we begin to execute this code, and is on after the code is executed then overflow has occurred somewhere in the expression. Often it is not important to know exactly where overflow has occurred, but only if it has occurred at any time during the evaluation of the expression. This can be tested by one test at the end of the expression computation as,

JOV OVERFLOW

Remember that this will also turn the overflow toggle off so that it can be used for the next segment of code. This approach to testing overflow requires that the overflow toggle be off before the computation of the arithmetic expression is begun. How do we guarantee that it is off? The easiest way is to test for overflow whenever it may occur. This assures that as soon as it happens, it will be recognized and appropriate action taken. Alternatively, we will need to turn the overflow toggle off explicitly. Searching the list of machine instructions, we find that there is no instruction to turn off the overflow toggle. But consider

JOV *+1
This instruction will not jump if the overflow is off, and if it is on, it will turn the overflow off, and jump to *+1. In either case the next instruction will be at *+1, with the overflow toggle off.

4.2 JUMPS

The discussion of overflow brings up the subject of jumps and particularly conditional jumps. These instructions allow us to compute different functions based on the value of our data. Consider the absolute value function

if n < 0, then absolute value of n = -n otherwise absolute value of n = n.
To compute this we could write the following code
LDA N JANN 1F IF NOT NEGATIVE LEAVE ALONE LDAN N BUT OTHERWISE LOAD -N 1H EQU * A REGISTER HAS ABS OF N

When execution reaches the label 1H, the A register has the absolute value of N. The JANN will jump directly to 1H if the contents of the A register is positive or zero. If N was negative, then the LDAN would load the negative of N into the A register and then "drop through" to 1H.

Tests on the sign of a number or on the zero/nonzero nature of a number can be done by loading into a register and testing the register. The jump instructions allow the A and X registers and the index registers to be tested directly for positive, negative, zero, non-zero, non-positive, and non-negative states.

We can use these instructions to compute the sign function. The sign function of a number p is +1 if p is positive, zero if p is zero and -1 if p is negative. If we want to leave the sign function of p in the A register, we can write

LDA P JAZ 1F JAP 2F ENTA -1 NEGATIVE NUMBER JMP 1F 2H ENTA +1 POSITIVE NUMBER 1H EQU *
Follow this code for each of the three cases (p < 0, p = 0, p > 0) to make sure that it is correct.

Comparisons between two numbers, and computations based on this comparison, generally use the CMPA or CMPX instruction. For example, to execute the code at FGREAT if F > G, or FLESS if F < G, we can write

LDA F CMPA G JG FGREAT JL FLESS
What happens if F = G? In the above code, we "drop through" the tests. If we wanted to take the FGREAT path, then we should have written
LDA F CMPA G JGE FGREAT JL FLESS
and the JL could have been replaced by a JMP.

To compute the maximum of two numbers P and W, we could write

LDA P CMPA W JGE 1F LDA W 1H EQU *
This code leaves the maximum of P and W in the A register.

4.3 LOOPS

Jumps are a very important part of computers. Code which does not have any jumps is called straight-line code, since it is executed in a linear manner, one statement after another. Jumps are used in two ways. One is to produce conditional execution of code similar to an IF statement. The other use of jumps is to create loops. Loops allow code to be repeated until some condition is met. The condition may be that a particular property holds, or that the loop has been executed a certain number of times. Both of these are programmed by the use of conditional jumps.

Suppose we wish to find a number i such that i2N < (i + 1)2. We can do this by a loop which can be described as,

set i to 1. if i × i > N then i - 1 is our number otherwise set i to i + 1 and repeat the test again.
In MIXAL this would be
ENTA 1 STA I SET I TO 1 * 1H LDA I MUL I AX = I*I CMPX N COMPARE LOWER 5 BYTES JG FOUND IF GREATER, QUIT LOOP * LDA I INCA 1 SET I TO 1+1 STA I JMP 1B REPEAT LOOP * FOUND LDA I DECA 1 WANT I-1
This is a straightforward coding of the previous algorithm. We would hope that some local optimization would be able to improve that code, but the only obvious improvement is to move the label 1H back one location to the STA I in order to eliminate the second STA I before the JMP 1B.

Another kind of loop is the equivalent of the DO-loop of Fortran. In this kind of loop, we repeat execution of the loop a fixed number of times, as counted by an index variable. The DO-loop was specifically designed to allow its index variable to be stored in an index register. Thus the DO-loop

DO 10 I = 1, N <DO-loop body> 10 CONTINUE
can be written in MIX code as (letting index register 3 be the index variable I)
ENT3 1 1H EQU * <MIX code for DO-loop body> INC3 1 CMP3 N JLE 1B CHECK FOR END OF LOOP
For example, to add the numbers from 1 to N, we could write
ENT3 1 LOOP INDEX = 1..N ENTA 0 ACCUMULATOR OF SUM 1H INCA 0,3 INCREASE A BY CONTENTS OF 13 INC3 1 SET FOR NEXT LOOP CMP3 N IF ANY JLE 1B * A HAS SUM OF 1 TO N
Notice that since addition is commutative, we could also add our numbers backwards, from N to 1. The advantage of this is that our index can then be tested against zero, which eliminates the CMP3 instruction, making the code slightly shorter (and faster) as
LD3 N ENTA 0 ACCUMULATOR FOR SUM 1H INCA 0,3 A = A + INDEX 3 DEC3 1 J3P 1B CHECK FOR END OF LOOP * A HAS SUM OF 1 TO N.

Loops can be nested. When loops are nested, different index registers should be used to control the loops, of course.

4.4 ARRAYS

One of the primary reasons for the use of loops is the repetition of a piece of code on each element of an array. An array is a linear list of similar items which is indexed to specify any particular element. Arrays are represented in MIX programs by sequential locations in memory. Memory is allocated for an array by the ORIG pseudo-instruction. For example, an array of 100 integers can be declared by

BASE ORIG *+100
This statement moves the location counter of the assembly program ahead 100 locations, leaving 100 locations of memory for the array. The elements are accessed by BASE, BASE+1, BASE+2, …, BASE+99. The label (BASE in this case) is called the base address. To avoid having to write out each address separately, we use indexing to specify the address of each element in the array relative to the base address.

FIGURE 4.1 Storage allocation and accessing for an array.

To initialize an entire array to zero, we could use code such as

ENT3 0 INDEX FOR ARRAY REFERENCING 1H STZ ARRAY,3 SET ELEMENT 13 TO ZERO INC3 1 CMP3 =N= CHECK FOR END OF ARRAY JL 1B
where the symbol N has been defined as the length of the array, as for example,
N EQU 176 LENGTH OF ARRAY IS 176
and the array ARRAY has been defined as
ARRAY ORIG *+N
Notice that an actual number may be used instead of the symbol. For example, for an array of length 238 called XRAY,
XRAY ORIG *+238 RESERVE SPACE FOR XRAY ... ENT6 0 INDEX FOR LOOP AND SUBSCRIPT 7H STZ XRAY,6 INC6 1 CMP6 =238= CHECK FOR END OF ZERO LOOP JL 7B

The symbol is used rather than the actual number because (a) it can be more descriptive, and (b) if the length of the array must be changed later, only the value of the symbol need be changed (one line of code) if a symbol is used, instead of many lines of code (one for every time the length of the array is used).

As with the loops seen above, the index variable can often be manipulated to make the code simpler by running the index towards zero, rather than a non-zero quantity. For an array 3DIM of length 27, for example, we could zero it by

3LENGTH EQU 27 LENGTH OF 3DIM ARRAY 3DIM ORIG *+3LENGTH * ENT2 3LENGTH-1 LENGTH MINUS ONE 2H STZ 3DIM,2 ZERO 3DIM DEC2 1 J2NN 2B CHECK FOR ENTIRE ARRAY DONE

We can also index our array by quantities other than 0 to (the length of the array minus one). If we want an array called YEAR which is indexed from 1900 to 1984, we can declare it as

YEAR ORIG *+85
Then the label YEAR refers to the first location of the array. We want this to correspond to an index value of 1900, so the correspondence between addresses and index values is
address   index value
YEAR   1900
YEAR+1   1901
YEAR+2   1902
 
YEAR+84   1984

If index register 3 has the index value, then the address expression to access that element of the array is YEAR-1900,3. This is the quantity YEAR-1900 plus the contents of register 3. Since the index value is between 1900 and 1984, we have
    1900 ≤ I3 ≤ 1984,
    0 ≤ -1900+I3 ≤ 84,
and   YEARYEAR-1900+I3YEAR+84,
so our address will be in range. To illustrate it more concretely, suppose the array YEAR begins at location 1000 and continues to location 1084, then the address YEAR-1900,3 is 1000 - 1900,3 = -900,3. From 1900 ≤ I3 ≤ 1984, we have 1000 ≤ -900,3 ≤ 1084, again showing that the address which eventually results from the indexed address is correct.

The most common form of this type of array indexing is with arrays indexed from 1 to M, rather than 0 to M-1. For this case the address is (using index register 5 to hold the index value), ARRAY-1,5.

The types of processing which can be done on an array are almost endless, as are the variations of coding which can be used. For example, consider the array XRAY mentioned earlier, of length 238. XRAY can be zeroed by the code

ENT5 238 2H STZ XRAY-1,5 DEC5 1 J5P 2B
Notice our index runs from 238 down to 1, as the locations zeroed vary from XRAY-1+238 down to XRAY-1+1.

Consider how long this code takes to execute. The ENT5 is executed once and takes 1 unit of MIX time. The STZ takes 2 units, DEC5 1 unit, and J5P 1 unit, with each of these being executed 238 times. The total time is then 1 + 238 × (2 + 1 +1) = 953 units.

Now consider the code

ENT5 238 2H STZ XRAY-1,5 STZ XRAY-2,5 ZERO TWO ELEMENTS AT ONCE DEC5 2 J5P 2B
This code zeros the array the same as the previous loop, but it does it two elements at a time. Notice that it is one instruction longer, but the loop is only executed 119 times (238/2) rather than 238 times. The execution time is then 1 + 119 × (2 + 2 + 1 + 1) = 715 units, saving 238 units over our first code. (However, it should be pointed out that 238 units is much less than a millisecond, so unless this code was to be executed millions of times, the effort to save the computer time would take much longer than the savings). This is an example of a space-time tradeoff. We can trade one extra location in memory for 238 units of time (in this case). It is almost always possible to make a program run faster by making it more complex and hence longer in terms of the amount of space it uses. Similarly, it is generally possible to make a program use less space by making it take more time to execute. These tradeoffs (more time/less space; less time/more space) are most often the result of using different algorithms. Consideration of the time-versus-space tradeoffs of particular algorithms is very important when programming, since often the machine may have very limited memory available, or computer time is very expensive (or both).

Array loops need not be executed a fixed number of times. Suppose we have an array NAME ORIG *+500 and we want to search the first 37 entries for an entry whose value is in the A register, because we wish to know the index of such an item. Searching for an item in the A register can be

ENT1 0 INDEX FOR SEARCH 4H CMPA NAME,1 JE FOUND MATCH FOUND INC1 1 OTHERWISE INCREASE INDEX CMP1 =37= JL 4B IF STILL MORE TO CHECK

If the code "drops through", then the item is not in the first 37 elements of the array; if the code jumps to the label FOUND (someplace else in the program), index register 1 has the index of the item in the array NAME.

Now a "smart" programmer comes along and says "I can speed that up by running my index from 37 down to 1 rather than from 0 to 36." He proceeds to write the code

ENT1 37 4H CMPA NAME-1,1 JE FOUND DEC1 1 J1P 4B

Now several questions arise. First, how much time has been saved? Only the loop is changed, and it now has a DEC1 instead of an INC1 (but these take the same time) and a J1P instead of a CMP1, JL (saving 2 units and 1 memory location, plus the literal). Since the loop is executed 38 times, this saves 76 units. (Consider how much real time this is). Another question is, does the code do the same thing? Well, no, not quite. Notice that if FOUND is reached in this version, the value of index register 1 is one more than the value of index register 1 from the first version. This means that either all references which were NAME,1 must now be changed to NAME-1,1, or we need to modify the code slightly to run our index from 36 to 0 rather than 37 to 1, as

ENT1 36 4H CMPA NAME,1 JE FOUND DEC1 1 J1NN 4B
This makes the code the same except in one, not necessarily unlikely case. Suppose there are two (or more) items in the array which are equal to the A register. Our first code, running the index register from 0 to 36, would find the entry that matched which had the lowest index, since it searches from smallest index to largest. The code above would find the match with the highest index. This may make a very large difference to the code at FOUND.

Another difference between these two pieces of code is in the average time that it takes them to execute. Notice that for this code two times are important: how long it takes to execute if the item is not found, and how long it takes to execute if an item is found. If the items searched for tend to be at the beginning of the table most of the time, then searching from the front will be faster (on average), while if the items searched for tend to be towards the end of the table, then the search from the back will be faster (on average). Most often we may not know where our searched for items will tend to be (front or back), and so the choice between these two techniques should be made on other considerations (such as indicated above). Notice however that our not knowing which code will be faster does not mean that they both will take the same time; one of them will be faster on average, we just do not know which. The problem of selecting efficient searching and sorting algorithms has received considerable attention and is the subject of many other books.

Arrays need not be constructed of only one-word elements. Multiple-word elements are also possible. These arrays are sometimes called tables. Each element of a table is a record with several fields. For example, the MIXAL assembler builds a symbol table which is used to translate symbols in an assembly language program into their values. This requires two parts to each item, the name and the value. The name may be up to 10 characters long, so we need two words for it, plus one word for the value. Thus each record is three words long, with a format such as
first 5 characters of name
second 5 characters of name
value of name

Suppose that the symbol table is an array of up to 500 such records declared by

SYMBOL ORIG 3*500+* SAVE 500 ENTRIES 3 WORDS EACH
and LENGTH is a variable whose value is the number (from 0 to 1500) which is the number of words in the symbol table. The table is originally empty (LENGTH = 0) and as each new symbol is entered into it, the length is increased by 3. LENGTH then has the index of the next empty spot in the symbol table. Then we can search the symbol table for a name whose first five characters are in the A register and whose second five characters are in the X register by
ENT1 0 1H CMPA SYMBOL,1 JNE 2F CMPX SYMBOL+1,1 COMPARE SECOND HALF OF NAME JE FOUND 2H INC1 3 3 WORDS PER RECORD CMP1 LENGTH JL 1B
If the program drops through, the name in the AX register is not in the table; if control transfers to FOUND, the value of the name in AX is at SYMBOL+2,1.

One problem with the above might be if instead of LENGTH being the number of words in the table, suppose we only had NITEMS whose value was the number of records in the table. (So LENGTH = 3*NITEMS). From this we could compute the number of words by a multiplication or we could run our loop with two index registers, one for counting the number of items (0, 1, 2, …, NITEMS) and the other for indexing the symbol table (0, 3, 6, …, 3*(NITEMS-1)). This can be coded as

ENT1 1 COUNT NUMBER OF ITEMS ENT3 0 INDEX FOR SYMBOL TABLE 1H CMP1 NITEMS END OF TABLE? JG NOTFOUND CMPA SYMBOL,3 CHECK FIRST HALF JNE 2F CMPX SYMBOL+1,3 CHECK SECOND HALF JE FOUND 2H INC3 3 NOT YET, INCREMENT INDEX INC1 1 JMP 1B

Here we have made some other changes also. Control transfers to either FOUND or NOTFOUND, depending on whether the symbol is found or not. We have moved the test for completion of loop (I1 > NITEMS) to before the comparison to allow for a table with zero entries. If the item is found, SYMBOL+2,3 is the address of the value of the name in AX; if it is not found, I3 has the index of the first empty location in the table.

4.5 STACKS

One common use for an array is to implement a stack. A stack is a data structure along with a specific way of entering data onto and removing data from the stack. The classical example is a stack of plates or trays in a cafeteria. As plates are washed, they are added to the top of the stack; as they are needed they are removed from the top of the stack. This type of adding to the top and removing from the top mechanism is often very useful in computer systems (for example, with subroutines as explained in Chapter 6), and so we show here how a stack would be coded.

Two data structures are needed for a stack: an array, to store the stack and a variable to indicate the current number of items in the stack (top of stack pointer). As with tables, the items in a stack may be several words long, but for simplicity we assume that each item to be stacked is only one word. To provide storage for a stack, we define

TOP CON 0 TOP OF STACK INDEX STACK ORIG *+LENGTH SPACE FOR STACK
Notice that although the stack may have a variable number of elements in the stack, we must allocate a fixed amount of space for the stack. Thus, it is necessary to estimate the maximum size that the stack will ever grow to be, and allocate that much memory (or even a little more). This can be done by a LENGTH EQU n, where n is the amount of space wanted. By using a symbolic stack length indicator, we need only change the EQU statement if we decide we need more or less stack space.

FIGURE 4.2 Stack operations: the PUSH operation adds a new element to the top of the stack; the POP operation removes it.

Two operations are possible on a stack: PUSH and POP (or STACK/UNSTACK, ADD/REMOVE, etc.). The PUSH operation involves adding a new element to the top of the stack. This requires storing the element on the top of the stack and increasing the number of elements by one. Assuming the element is in the A register and index I5 is not in use, the code for this would be,

LD5 TOP TOP OF STACK POINTER CMP5 =LENGTH= CHECK IF STACK IS FULL JGE OVERFLOW IF SO, STACK OVERFLOW * STA STACK,5 STACK NEW ELEMENT INC5 1 ST5 TOP SET TOP = TOP + 1 *

This code checks to see that we do not store more elements in the stack than we have allocated space for. This type of error checking is good programming practice.

To remove an element from the stack and put it into the A register requires a similar sequence of code. This time we check to make sure that the stack is not empty before we remove an element from the stack. Also since the top-of-stack index (TOP) is the index of the next empty space in the stack array, we decrement it before loading the stack element.

LD5 TOP DEC5 1 SET TOP = TOP + 1 J5N UNDERFLOW CHECK IF STACK IS EMPTY * LDA STACK,5 LOAD TOP OF STACK ST5 TOP RESTORE TOP INDEX
An additional function on a stack would be a test for an empty stack.

4.6 CHARACTER MANIPULATION

In addition to arrays of multiple-word items, arrays may be of partial-word elements, and particularly of single characters (one byte). The techniques which are used for manipulating arrays of characters (called strings) are very important. One character requires one byte for storage, thus it would be very wasteful to use an entire word to store only one character. Also, certain I/O devices, such as the card reader and line printer, use character strings which are packed five characters per word. For computational purposes, however, we wish to use the string one character at a time, generally. This requires unpacking the string and extracting one character from it. Two techniques are commonly used on the MIX computer: partial field specifications and shifting.


FIGURE 4.3 Packed and unpacked strings of characters (characters must be packed for input and output, but it is more convenient to program unpacked characters).

As an example, suppose we have an input card in the 16 locations CARD, CARD+1, …, CARD+15, and we wish to count the number of blank characters on the card. We want to examine each of the 5 bytes in each of the 16 words of the card. Letting index register 2 be our counter, we can write the following code to do this, using partial field specifications.

ENT2 0 COUNTER FOR BLANKS ENT1 0 INDEX FOR WORDS, 0..15 1H LDA CARD,1(1:1) CHECK FIRST BYTE CMPA BLANK JNE *+2 INC2 1 FOUND A BLANK, COUNT IT LDA CARD,1(2:2) CHECK SECOND BYTE CMPA BLANK JNE *+2 INC2 1 LDA CARD,1(3:3) THIRD BYTE CMPA BLANK JNE *+2 INC2 1 LDA CARD,1(4:4) FOURTH BYTE CMPA BLANK JNE *+2 INC2 1 LDA CARD,1(5:5) FIFTH BYTE CMPA BLANK JNE *+2 INC2 1 * FINISHED WORD INC1 1 INDEX FOR NEXT WORD CMP1 =16= CHECK FOR LAST WORD JL 1B * INDEX 2 HAS BLANK COUNT

This is a very simplistic approach which has its advantages but is very tiring in terms of writing code. The only varying portion is the field specification of the LDA instruction. There are several approaches to shortening the code. One is through instruction modification. In this approach we notice that the field specification is stored in byte 4 of the load instruction and we want this to be successively (1:1), (2:2), (3:3), (4:4), (5:5). The following code does this by keeping in index register 3 the partial field we want to use and changing the LDA instruction for each execution of the loop.

ENT2 0 COUNTER OF BLANKS ENT1 0 INDEX FOR WORDS 1H ENT3 1:1 FIRST BYTE 2H ST3 *+1(4:4) MODIFY LDA FIELD LDA CARD,1(7:7) PARTIAL FIELD SET AT *-1 CMPA BLANK JNE *+2 INC2 1 COUNT A BLANK INC3 1:1 INCREASE FIELD SPECIFICATION CMP3 =5:5= CHECK END OF WORD JLE 2B INC1 1 IF END OF WORD, INCREASE INDEX CMP1 =16= CHECK END OF CARD JL 1B * INDEX 2 HAS BLANK COUNT

This shortens the code but is more difficult to follow. The problem is that in debugging this code, the instructions which are executed are not the instructions which are written in the assembly program listing. You cannot tell by looking at the listing what the instruction which used to be LDA CARD,1(7:7) is in fact when it is executed. This makes it very difficult to debug a program and is considered a poor programming practice.

The more common approach to character manipulation is through shifting. Consider the word
A B C D E
If we load this into the A register, the (1:1) byte will have the first character we wish to use. Then if we SLA 1, we have
B C D E 00
and the second character we want is now in the (1:1) byte. Another SLA 1 gives us
C D E 00 00
Another SLA 1 gives us
D E 00 00 00
and finally a last SLA 1 gives
E 00 00 00 00
This allows us to examine each character in the (1:1) byte of the A register without needing to do partial loads. The code to count blanks using this technique can be

ENT2 0 NUMBER OF BLANKS ENT1 0 WORD INDEX 0..15 7H ENT3 5 BYTE COUNTER 5..0 LDA CARD,1 PICK UP 5 BYTES 1H CMPA BLANK(1:1) BLANK HAS BLANK CHAR JNE *+2 INC2 1 FOUND A BLANK SLA 1 SHIFT NEXT BYTE INTO (1:1) DEC3 1 DECREASE BYTE COUNT J3P 1B IF STILL MORE BYTES THIS WORD INC1 1 NEXT WORD CMP1 =16= IF ANY LEFT JL 7B * INDEX 2 HAS BLANK COUNT

Other shift instructions can be used to vary the position of the character which we are processing. If we load into the X register and SLAX 1 before processing, Then the characters appear in byte (5:5) of the A register for processing as
A Register   X Register
         
 
S T U V W
 
  SLAX 1
 
        S
 
T U V W 00
 
  SLAX 1
 
      S T
 
U V W 00 00
Or we can load into the A register and shift circularly (SLC 1) into the X register.

This approach is fine for processing the characters sequentially from left to right, but what if we need only the 37th character from the card? The 37th character will be in word CARD+7, byte (2:2). How do we access it? First, how did we determine its location? Looking at the card as it is stored in memory
COLUMN OF CARD 1 2 3 4 5 6 7 8 910 11 80
WORD 0 0 0 0 0 1 1 1 1 1 2 15
BYTE 1 2 3 4 5 1 2 3 4 5 1  5
To obtain the word and byte number of a given column of the card, we notice that COLUMN NUMBER = BYTE NUMBER + 5*WORD NUMBER. If we divide the column number by 5, this almost gives us what we want in the quotient and remainder, but column 5 divided by 5 gives word 1 instead of word 0, so we subtract one from the column number first to give
WORD NUMBER = (COLUMN NUMBER-1) / 5
BYTE NUMBER = 1 + REMAINDER OF (COLUMN NUMBER-1)/5
Notice that the DIV instruction returns both of these numbers, one in the A register and the other in the X register.

Once we know the word number, we can use that as an index to load the appropriate word. Then, to get the correct byte we shift the loaded word so that the byte we want is in some convenient location (generally byte 1 or 5 of the A or X registers). For example, if we load into the X register and shift left the number of bytes which is the byte number (1, 2, 3, 4, or 5), the desired byte will be in byte 5 of the A register. To illustrate, suppose register I5 has the column number of the character we want. Then our code can be as follows. First, put the column number minus one in the X register and zero the A register for the DIV.

ENTA 0 ENTX -1,5 COLUMN NUMBER MINUS ONE DIV =5=
Now the A register has the quotient and the X register has the remainder. Both of these need to go into index registers. Unfortunately, MIX does not provide convenient ways to move information from the A or X registers to the index registers, except via memory. One way to do this would be to store them in a temporary as
STA TEMP LD2 TEMP STX TEMP LD3 TEMP
Another way is self-modifying code.
STA *+1(0:2) ENT2 0 STX *+1(0:2) ENT3 0
The address fields of the ENT2 and ENT3 instructions are modified at execution time to the values of the A and X registers, respectively. This saves one memory location (TEMP) and is only six time units instead of eight.

Once we have the word number in I2 and the byte number in I3, we can obtain the character we want by

LDX CARD,2 SLAX 1,3

The character is now in byte 5 of the A register. We could have put it in byte 1 of the A register by

LDA CARD,2 SLA 0,3
or by
LDX CARD,2 SLAX 5,3
or by
LDA CARD,2 SLC 10,3
among others.

Code such as the above allows us to specify characters by a character or column number, and hence ignore the packed nature of the characters in memory. However, notice that this takes a fair amount of computer time (the DIV instruction has the longest execution time of any instruction in MIX) and it also requires the use of both the A and X registers and two index registers. Thus, it may be better to break the processing of a card into two parts: first, unpack the card, and second, process the unpacked form. This allows the character number to be used as an index into the unpacked character array. After being processed, the data may need to be packed again for output.

Consider a program which reads a card and squeezes out all multiple blanks between the words on a card. The input might be English text which was prepared on a keypunch whose space bar sticks and sometimes inserts several blanks, between words. Our overall approach for the program could be to first read the card into a 16-word array CARD. Then it is unpacked into the 80-word array CHAR. From this array we move the card to an array OUTPUT without multiple blanks, and then finally pack it into an array called LINE for output (say to a card punch). Ignoring the actual input and output (which is the subject of Chapter 5), we can write the following code.

* * DATA ELEMENTS * CARD ORIG *+16 INPUT CARD IMAGE CHAR ORIG *+80 UNPACKED INPUT IMAGE OUTPUT ORIG *+80 UNPACKED COMPRESSED OUTPUT LINE ORIG *+16 PACKED OUTPUT * <code to read into CARD> * * UNPACK THE CARD IMAGE OF 16 WORDS, * 5 CHARACTERS PER WORD FROM CARD INTO CHAR. * UNPACKING IS DONE RIGHT TO LEFT TO ALLOW * REGISTERS TO BE TESTED AGAINST ZERO. * * REGISTER USAGE * * I1 = WORD COUNTER FOR CARD 15..0 * I2 = BYTE COUNTER 4..0 * I3 = INDEX FOR CHAR 79..0 * A = LOADED WORD SHIFTED TO GIVE THE RIGHT * CHARACTER IN BYTE 5 ENT3 79 CHARACTER COUNTER 79..0 ENT1 15 COUNTER OF WORDS 15..0 NEWWORD LDA CARD,1 ENT2 4 COUNTER OF BYTES 4..0 NEXTCHAR STA CHAR,3(5:5) DEC3 1 DECREASE INDEX INTO CHAR SRA 1 SHIFT WORD IN A RIGHT ONE DEC2 1 ONE LESS BYTE J2NN NEXTCHAR DEC1 1 ONE LESS WORD J1NN NEWWORD * * MOVE CHARACTERS FROM CHAR TO OUTPUT ONE AT A * TIME. IF A BLANK IS FOUND, SET A FLAG TO * PREVENT ANY MORE BLANKS FROM BEING STORED. I1 * IS INDEX FOR CHAR, I2 INDEX FOR OUTPUT. FLAG * FOR MULTIPLE BLANKS IS HELD IN REGISTER I3. * CODE MAKES USE OF MIX CODE FOR BLANK BEING * ZERO. * ENT1 -80 INDEX FOR CHAR -80..-1 ENT2 0 INDEX FOR OUTPUT 0..79 ENT3 STOREBLK FIRST BLANK SHOULD BE STORED * ANOTHER LDA CHAR+80,1 LOAD NEXT CHARACTER JMP 0,3 PROCESS THIS CHARACTER * AT STOREBLK OR NOSTORE. STOREBLK JANZ STORE STORE ALL, BUT IF BLANK SET ENT3 NOSTORE TO SKIP ANY FUTURE BLANKS JMP STORE * NOSTORE JAZ SKIP IF BLANK, SKIP IT ENT3 STOREBLK IF NOT SET TO STORE NEXT * * STORE STA OUTPUT,2 STORE THE CHAR IN OUTPUT INC2 1 SKIP INC1 1 MOVE TO NEXT CHARACTER J1N ANOTHER * * LINE NOW COMPRESSED, MOVE TO LINE ARRAY. * FIRST BLANK THE ENTIRE ARRAY * ENT1 LINE+1 STZ LINE MOVE LINE(15) ZERO (BLANK) LINE ARRAY * * NOW PACK CHARACTERS BACK INTO ARRAY LINE FROM * OUTPUT. I2 HAS THE LENGTH OF THE ARRAY OUTPUT. * SINCE EXTRA BLANKS HAVE BEEN DELETED, MAY BE * LESS THAN 80. I1 WILL INDEX THE PACKED WORDS * OF LINE, I4 INDEXES OUTPUT. CHARACTERS WILL * BE LOADED INTO BYTE 5 OF A, SHIFTED TO BYTE 1 * OF A AND THEN SHIFTED INTO THE X REGISTER. * ENT4 0 ENT1 0 WORDS OF LINE 0..15 PACKWORD ENT3 4 NUMBER OF BYTES IN THIS WORD * PACKNEXT LDA OUTPUT,4(5:5) INC4 1 FETCH NEXT CHARACTER NEXT TIME SLA 4 SHIFT CHAR INTO BYTE 1 SLC 1 SHIFT INTO X DEC2 1 J2NP LASTONE COUNTER OF CHARACTERS IN CHAR DEC3 1 BYTE COUNT J3NN PACKNEXT STX LINE,1 WHEN 5 BYTES PACKED STORE IN INC1 1 NEXT WORD INDEX JMP PACKWORD * LASTONE SLC 0,3 JUSTIFY LAST PARTIAL WORD STX LINE,1 * <code to output LINE >

4.7 LEXICAL SCANNING

A very common form of character manipulation is lexical analysis. Many programs operate on character data which is constructed from free-format multi-character words or symbols. The words or symbols are separated by blanks, special characters, or other delimiters. To operate on them, it is necessary to group together contiguous characters into words or symbols. This is lexical analysis, or lexical scanning.

For example, the entire text of this book has been stored on magnetic tape. To insure that there are no spelling errors, a program was written which produced a list of all of the different words in this book. This list was scanned for misspelled words. To produce the list, it was necessary to scan a line of characters and identify where each word began and ended. Then these individual characters were packed into a single item and stored in a table.

Assume that no single word or symbol (called a token) is more than 10 characters long. It could be packed into the A and X registers, using shift instructions as shown in the last section. Then, to find all the words on a card, we could write a program such as

* * READ A CARD * NEXTCARD <code to read a card into the array CARD> * * UNPACK THE CARD INTO THE ONE CHARACTER PER WORD * FORMAT IN THE ARRAY CHAR. * <code to unpack CARD into CHAR> * * NOW SCAN THE CHARACTER ARRAY CHAR UNTIL THE FIRST * ALPHABETIC CHARACTER IS FOUND. TREAT BLANKS (00) * AND SPECIAL CHARACTERS (>39) AS DELIMITERS. * * INDEX REGISTER 6 POINTS TO THE END OF A TOKEN, * WHILE INDEX REGISTER 5 POINTS TO THE BEGINNING. * ENT6 -1 * * SKIP LEADING DELIMITERS FIRST * NEXTOKEN INC6 1 CMP6 =80= TEST END OF CARD JGE NEXTCARD * LDA CHAR,6 GET NEXT CHARACTER JAZ NEXTOKEN BLANK CMPA =40= OR SPECIAL JGE NEXTOKEN * * FOUND NONBLANK, NONSPECIAL * SET INDEX REGISTER 5 * THEN LOOK FOR NEXT BLANK OR SPECIAL * ENT5 0,6 * 1H INC6 1 CHECK NEXT CHARACTER LDA CHAR,6 JAZ ENDTOKEN IF BLANK CMPA =40= JL 1B OR SPECIAL * * TOKEN IS IN CHAR,5 TO CHAR-1,6 * ENDTOKEN <pack token into A and X register> * <process new token> * JMP NEXTOKEN REPEAT WITH NEXT *

4.8 SUMMARY

In this chapter, we have presented many different assembly language programming techniques. The techniques presented include how to compute arithmetic expressions, how to program conditionals and loops, examples of the use of arrays and tables, and some simple character manipulation techniques. Most of these concepts are easy and should quickly become a matter of habit. These are basic programming techniques; there are many others equally simple, and also a large number of more advanced techniques. The more practice you have in programming, the more skilled you will become at the art of assembly language programming. Additional programming concepts will be introduced throughout the remainder of this text.

Most assembly language programming techniques are developed by assembly language programmers by trial and error and are not written down for study. Many are passed from programmer to programmer informally, and many are picked up by reading the programs of a more experienced programmer. The program segments in the three volumes of The Art of Computer Programming by Knuth (1968, 1969, 1973) are good sources for MIX. Gear (1974) has a chapter on programming techniques, as does Eckhouse (1975). Stone and Siewiorek (1976) have many examples. In all cases except Knuth, however, these examples are for a computer other than the MIX computer. Thus, it is generally necessary to understand the computer instruction set and assembly language being used as the example language of that book before the programming techniques can be transferred to your MIX programs.

EXERCISES

  1. What is the difference between programming and coding?
  2. Write MIXAL code to compute the value of the expression, (J*LOGICAL)/PHYSICAL + BASE.
  3. Write the MIXAL code to compute Z = Y+2*(W+V)/4-6*(10-W-V).
  4. Section 4.2 gives a four-statement program to compute the absolute value of a number. This was done to illustrate the use of jumps. Can you give one statement which calculates the absolute value of a number?
  5. Section 4.2 also gives a seven-statement program to compute the sign function. Can you give a shorter program to do the same, but without jumps?
  6. Why would a programmer write
    LABEL EQU *
    rather than just attaching the label to the next instruction?
  7. Write a code segment which simulates the MOVE instruction, but without using the MOVE instruction,
  8. Why are some loops run backwards? Consider the two loops in Section 4.3 to compute the sum of the numbers from 1 to N. How much time does each take to execute for N = 3? For N = 10? For N = n?
  9. How does an array differ from a table?
  10. Consider the following code to index an array with a subscript in the range 400 to 800.
    ARRAY ORIG *+401 ... LD1 INDEX DEC1 400 LDA ARRAY,1
    Can you improve on this code? How is your code better?
  11. What is a space-time tradeoff?
  12. Write the code to add two double-precision integers in MIX. Define your data representation,
  13. The stacks shown in Section 4.5 grow up, from low memory to high memory. Give the code for the PUSH and POP operations which would manipulate a stack which grows down from high memory to low memory,
  14. Three different pieces of code were given to count the number of blanks on a card. How much time and how much space does each take?
  15. Write a program to read a sequence of numbers and compute their maximum, minimum, and average. What problems arise in computing the average?
  16. Write a program to read a set of numbers, calculate their sum, and then print the numbers and, for each number, its percentage of the sum. Do the printed percentages add up to 100? If not, why?
  17. Write a program to read a text file and produce a table of all of the different words, and the number of times they occur. Consider printing the list either alphabetically or in order of decreasing frequency of words.