SMALL COMPUTER ARCHITECTURES FOR THE COMPILATION OF HIGHER LEVEL LANGUAGES

James L. Peterson

Systems Programming Laboratory Note 3
SPL/3

Department of Computer Sciences
The University of Texas at Austin
Austin, Texas 78712

September, 1977

ABSTRACT

Based upon recent experience in writing a compiler for the Nova computer, some ideas are put forth on architectural considerations for the design of small computers (minicomputers and microcomputers) which are to be used with higher-level languages.

1. Introduction

Within the last few years, the development of microcomputers has changed the way in which computers are being used. The computing literature is constantly filled with predictions of the varied and widespread use of these small computers throughout our society. This is primarily due to both their small physical size and low cost. Mass production of millions of processors will reduce prices even more.

However, it is unlikely that these devices will propagate nearly as fast as propsed for one simple reason: software. The new microprocessors, despite their low hardware cost still require programming and may in fact require more difficult and lengthy programming than their larger cousins due to limited memory and unsuited instruction sets. Most programming for small computers is still done in assembly language at a considerable cost in programmer time. Programmer productivity can be greatly increased by the use of a higher level language.

The advantages of higher level language programming have been presented many times and we will not repeat them here. We merely point out that if possible, a higher level language should be used rather than assembly language. This may result in a slightly larger and slower program than one written in assembly language, but the savings in programming costs will offset these hardware costs in many cases, particularly for prototypes and limited editions.

However, most current small computers are designed mainly with hardware constraints in mind, and hence are not entirely suitable for the compilation of higher-level language programs. Their instruction sets are filled with "features" which are all but useless for compiled code, while they lack the simple instructions which would make compilation easier and allow efficient execution of higher-level language programs.

As an example, consider the "indirect addressing" feature provided on most small computers. There is almost no way that a compiler can use such a feature. Particularly useless are multiple levels of indirection. The usual use for indirection is when no ability exists to index into an array. In this case the address must be calculated, stored into memory and used as an indirect address. The much more useful ability to index into memory from a register eliminates the need for indirection.

In the sections below, we present the types of instructions which are needed for compiled code, based upon recent experience with a Pascal compiler.

2. Addressability and Data Allocation

The most important problem for a compiler is the generation of code necessary to access data. This problem is caused mainly by the short word length of the processor which does not allow an arbitrary address to be specified in a one-word instruction. This has given rise to the multiplicity of addressing modes available on various machines such as zero-page/current-page, indirection, indexing, base registers and stack addressing. It is not possible to critique all of these addressing modes for all programs but the following general statements can be made.

Access must be easily made to both global variables and local variables. In a block structured language, this refers to the outermost and innermost blocks; intermediate blocks are less often accessed. For a Fortran-like language this refers to both COMMON variables and local variables. A language which requires that all variable be declared may have an edge here for a one-pass compiler.
An indexing capability for array accesses is needed. It should be possible to take an arbitrary integer expression and use it as an index into an array. The more indexing registers available the better.
Constants should be immediately available at any time. This can be best provided by embedding the constants directly into the compiled code at the place of use by immediate instructions.
Addresses of variables and array elements should be easily generated, either as constants or by simple instruction sequences. Consider the "load address" instruction of the IBM 370s.
A stack is not strictlly needed, but can be useful. However, if a stack is to be used for other than arithmetic temporary storage or return addresses, instructions must allow stack-based addressing and the generation of addresses in the stack. Notice that this capability might be better provided by an additional index register which could be dedicated as a stack pointer in those software systems which need a stack.
Such concepts as indirection or special auto-increment or auto-decrement locations, as provided on the PDP-8 or Nova computers are basically useless to a compiler.

3. Arithmetic and Logical Operations

Register allocation is necessary only if there are registers for allocating. However, if registers are provided for arithmetic operands, all operations should apply equally to all registers in a uniform manner. Otherwise the register allocation problem becomes very complicated. Since arithmetic expressions tend to be relatively simple, four to eight register which are not being used for other purposes (i.e., as base registers, index registers, stack pointers, etc.) should be quite sufficient.

Operands for all operations should be treated as signed numbers with appropriate overflow conditions readily detectable.

Addition and subtraction should need no special comments. However, multiplication and division cause special problems here because of the double register nature of the product of a multiplication. However, few programmers expect double-length results, since there are typically no instructions for manipulating them. The best solution might be a single length product with an overflow indicator. Separate instructions can provide both the quotient and remainder for division.

It is recognized that multiplication and division may not be directly implemented, but it would be convenient to define the instruction set to include these operations for later expansion.

Comparisons and conditional jumps are another source of possible problems. The threat of overflow makes the subtract-and-compare-with-zero approach to comparisons infeasible. A separate compare instruction is needed which either puts the result in a register or sets a condition code indicator. This compare should at last allow comparisons for equality, greater than and greater than or equal to, keeping in mind the signed nature of the operands. Conditional and unconditional jumps should be able to transfer control to any part of the current procedure, without limit to its size.

Most features of current machines are again not useful for compiled code. Skip-type conditionals are merely an excuse for a two-word jump-type conditional and can be better explained as a two-word conditional jump.

Similarly shifts and logical operations are an excuse for an inability to load an arbitrary field of a packed word. A compiler would be better able to used a specific "load packed field/store packed field" instruction. In a language such as Pascal, where the compiler controls all packing, the field boundaries and lengths would be known at compile time.

4. Procedure and Subroutine Calls

A jump to subroutine instruction, which provides the return address, is certainly needed, but most other capabilities have already been mentioned in previous sections. Instructions to save and restore all registers might be useful. The major problem which must be faced with subroutine calls is the sudden change in addressing environment. This should not require more than the resetting of a few registers.

5. Summary

We have tried to sketch the important features of a small computer which are needed for a compiler of a higher-level language. The most important problem is the addressability of both data and program locations. To illustrate the general requirements for a compiler, we include in the Appendix a list of all the code generation procedures from a recent compiler for a Pascal subset.

Appendix

Load Constant (v,r) -- put the constant v into the register r.
Gen Address (v) -- generate the address of v.
Gen Load (v,r) -- load the expression v into register r.
Gen Operation (v1,op,v2) -- generate the code for the operation op between expressions v1 and v2. Operations include addition, subtraction, multiplication, division (both DIV and MOD), tests for equality, nonequality, greater, less, greater or equal, and less or equal, AND, OR, and NOT.
Gen Array Address (v,p) -- index the array v by the expression p.
Gen Procedure Call (t) -- call the procedure t.
Gen Parameter (i,v) -- v is the ith parameter for a procedure call.
Gen Assignment (v1,v2) -- assign the value of the expression v2 to the address v2.
Gen Jump (k) -- jump to the label k.
Gen Test (v,k) -- test the boolean expression v and jump to k if v is false.
Gen Enter Procedure (t) -- generate the prolog for entry to procedure t.
Gen Leave Procedure (t) -- generate the epilog for exit from procedure t.
Gen Global Variables -- generate the code to declare global variables.
Gen Enter Main -- prolog for the main program.
Gen Leave Main -- epilog for the main program.

Home

Comments?