USING PASCAL ON THE NOVAS




James L. Peterson




Systems Programming Laboratory Note 4
SPL/4



Department of Computer Sciences
The University of Texas at Austin
Austin, Texas 78712




January, 1978




ABSTRACT

This note describes the procedure for using the Pascal compiler on the Nova computer system of the Department of Computer Sciences at the University of Texas at Austin. It also indicates the limitations of the system and how they can be overcome.


1. Introduction

A compiler for a Pascal-like language (hereafter called Pascal for brevity) has been written and is operational on the Nova computer system. The compiler translates programs written in a subset of Pascal into Nova assembly language. These assembly language programs can then be assembled, loaded, and executed in the normal manner.

The language subset which is compiled is defined further in [1] and the code generation approach is discussed in [2]. This note describes how to use the compiler, the language subset which is accepted and how to live with this subset, and how to interact with the current operating system, RDOS. This is the principal users manual for the Nova Pascal.

2. Using the Compiler

The compiler exists as a set of files as follows,


PASC.PA - The Pascal compiler (in Pascal) PASC.SR - The Pascal compiler in Nova assembly language PASC.RB - The Pascal compiler in relocatable binary PASC.SV - The executable core image Pascal compiler

The relationship amongst these files is illustrated in Figure 1.






Figure 1. The relationships amongst the Pascal compiler files. This also illustrates the steps involved in compiling, assembling, loading, and executing a Pascal program.

The compiler save file (PASC.SV) has been moved into both DP0 and DP0F, and may be executed from either of these two directories. Typical use of the compiler would involve:
  1. Write your Pascal program, limiting the program to the available subset of Pascal (see 5.0 below).

  2. Create a text file of your program using any of the text editors or other means such as cards and the card reader. We suggest that all Pascal programs be given the extension .PA. (file.PA).

  3. Run the Pascal compiler. The Pascal compiler is executed by the CLI command,

    PASC

    The compiler will respond with,

    INPUT FILE: file.pa
    OUTPUT FILE: file.sr

    asking the names of the input and output files to be used. The input file should be the Pascal file created in 2 above (file.PA); the output file may be any file. We suggest an extension of .SR (file.SR).

  4. Run the assembler on the output file of the compiler.

    MAC file
    or
    MAC file $LPT/L


  5. Run the loader on the output of the assembly,

    RLDR file


  6. Execute the compiled program.

    file

The compiler translates from Pascal source code directly into Nova assembly code, and needs no run-time support or library routines. All code is generated in-line. No subroutine calls, macros, or interpretation is used. (This is in contrast to other Pascal compilers based on the Pascal-P4 compiler which generates code for a hypothetical stack machine which must then be interpreted.)

3. Compiler requirements

The compiler is meant to run under RDOS 6.10, the Data General operating system, and makes appropriate system calls for all input/output. It requires at least 26K words of memory, plus whatever is needed for the operating system. The 26K memory allows a symbol table of up to 650 symbols. Since each symbol table entry is 11 words, over a fourth of the compiler is dedicated to the symbol table. Over 600 symbols are needed to compile the compiler, but for smaller user programs, a small compiler with a symbol table of about 200 symbols would reduce the memory requirements of the compiler to approximately 21K words.

The amount of memory needed increases if excessive syntactic nesting occurs. Nesting refers to such syntactic items as:

  1. Nested procedures.

  2. Parenthesized expressions.

  3. Compound statements.

  4. Begin/End blocks.

  5. Structured data-types.
Each of these syntactic items causes recursion in the compiler, generating stacked variables. Excessive stacking may cause the stack to exceed available memory, causing a trap. This can be overcome by increasing the memory available to the compiler with the SMEM command.

4. Error handling

The compiler is not exactly a robust program and is best when no errors occur. Three types of errors may occur: file errors, syntax errors, and compiler errors.

4.1. File errors

File errors involve any errors with the operating system interface; the input and output file definitions and operations. No errors are recognized by the compiler. This can result in an infinite loop in the compiler. For example, a non-existent input file will be treated as an infinitely long file of null characters.

4.2. Syntax errors

Syntax errors are detected by the compiler and simple recovery techniques are attempted. Errors are indicated by an output line of the following form:

ERR n COL m

after the line which caused the error. This indicates that error n occurred in column m. Errors are listed in Figure 2. The line in error may be the immediately previous line or the one before it. For error number 3 (looking for one syntactic token type but found a different one), the additional information of what token type was found and what type was expected is indicated on the next line as,

F=(found token type) (expected token type)

Token types are listed in Figure 3.

4.3. Compiler errors

A further problem is unrecoverable errors in the compiler. Unrecoverable errors are indicated by a line of the following form at the end of the output file,

**n
where n is the error number as indicated below.



  1. Too many errors (>600).

  2. Infinite search loop in symbol table.

  3. Same symbol inserted into symbol table twice

  4. Too many symbols deleted from symbol table.

  5. Too many levels of nesting (>8).

5. The language Subset

As indicated in SPL/1 [1], the language provided is a subset of the language Pascal. The following language features are NOT provided by the compiler.

  1. floating point numbers.

  2. user defined scalar types.

  3. pointers

  4. sets

  5. multi-dimensional arrays

  6. FOR, CASE, and WITH statements

  7. Input/Output (READ,WRITE,GET,PUT,EOF,files,...)

  8. Most system functions or procedures (SIN,COS,ABS,ODD,...)

  9. GO TOs and labels.
There were several reasons for any particular omission, but mainly these omissions were felt to be not completely necessary or very hard to implement and hence were left out for simplicity. In some cases, such as floating point numbers, the hardware simply does not provide adequate support for the feature. These features can be added by appropriate programming by the user, if necessary.

In other cases, alternative methods exist for the omitted items,

  1. A user defined type, such as COLOR = (RED, GREEN, BLUE), can be defined by CONST RED=0; GREEN=1; BLUE=2; ... TYPE COLOR = RED..BLUE;

  2. Pointers would require runtime storage allocation routines. If these are desired, allocate a large array and use array subscripts as pointers.

  3. For sets, use packed array of booleans.

  4. Multi-dimensional arrays can be defined as arrays of arrays, as with A: Array [0..n] of Array [1..m] of 0..127; and then accessed as A[I][J].

  5. FOR and CASE statements can be rewritten as WHILEs and IFs. WITH statements will require writing the complete record name at all times. Admittedly, there is a certain amount of efficiency lost here.

  6. Input/Output can be accomplished by using system calls as indicated below.

As one minor extension to the language, the compiler allows compile-time constant expressions to be used wherever a constant may be used, as in,

CONST SPACE = 2*n+1;
TYPE INDEX = 0..n-1;

6. Interacting with RDOS

All input/output is done by the operating system in response to users requests. To allow these requests to be made from any Pascal program, two intrinsic functions have been defined and added to the compiler. SYSTEM is a function which takes a variable number of parameters and makes a system call to RDOS. Since some of the system calls must pass addresses as parameters to the system, another intrinsic function ADR returns as its value the address of its argument.

  1. ADR -- The function ADR takes one argument and returns as its value the integer number which is the address in memory of the argument. Thus for a variable X, ADR(X) is the address of X. Since ADR(X) is an integer, it can be operated on by arithmetic operators. In particular, many system calls require a byte pointer, rather than a word address. Since there are two bytes per word, a word address can be converted into a byte pointer by multiplying by 2. For example if the array F has a file name packed two bytes per word, then 2*ADR(F) is a byte pointer to the file name in F. For a character variable, C, on the other hand, we need the low-order byte, 2*ADR(C)+1.

    Note that although the function ADR allows addresses to be manipulated by the Pascal programmer, the programmer has no mechanism, except the operating system, to interpret the integer number which is produced as an address; that is there is no MEMORY array or function which returns the value of the location in memory whose address is given as an integer. Hence ADR is useful only for system calls.

  2. SYSTEM -- The function SYSTEM compiles to a call on the RDOS operating system. The value of the function is the status value returned by RDOS. A value of zero indicates no error; a non-zero value indicates an error or special condition. The meaning of error codes can be determined by consulting the RDOS reference manual. Note that since RDOS error codes start at zero, and we have decided to use a zero error code as meaning no error, all RDOS error codes are increased by one before being returned to the user. Thus the error code 6 (end-of-file) in the RDOS manual is a status of 7 in Pascal.

    The SYSTEM function can have up to five parameters. The first parameter defines the type of system call to be made, and the second is an optional channel number. The remaining three parameters specify the values for registers AC0, AC1, and AC2 in the system call. Parameters that are not needed need not be given.

    The system call,


    STATUS := SYSTEM(call,channel,a0,a1,a2);

    is translated into the following code,
    load AC0 with a0
    load AC1 with a1
    load AC2 with a2
    .SYSTM
    .call channel
    if error, then STATUS := 1 + RDOS error code
    else STATUS := 0;

To illustrate the use of this facility, consider a program which inputs characters from a file INPUT and outputs them to a file OUTPUT. First the input file must be opened to a channel, say channel 3. The open call requires a byte pointer to the name of the file in AC0 and a mask in AC1.


Status := system(open,3,2*ADR("INPUT"),0);

For the output file, we must first create it, then open it.

status := system(creat,2*ADR("OUTPUT"));
status := system(open,2,2*adr("OUTPUT"),0);

To read a character into a variable C of type 0..127, we use the RDS (read sequentially n bytes) system call with a byte count of one. Note that RDOS only reads into the lower byte, so the upper byte must be zeroed to avoid incorrect comparisons with other characters.


c := 0;
status := system(rds,3,2*adr(c)+1,1);

To write C to OUTPUT, we simply,

status := system(wrs,2,2*adr(c)+1,1);

Strings can be output for messages by,

status := system(wrs,2,2*adr("BEGIN TEST"), 10);

Numbers are output by changing them to characters and outputting each character.

I := 0;
REPEAT
	I := I + 1;
	DIGIT[I] := N MOD 10;
	N := N DIV 10;
UNTIL N = 0;

REPEAT
	C := 48+DIGIT[I];
	STATUS := SYSTEM(WRS,2,2*ADR(C)+1,1);
	I := I - 1;
UNTIL I = 0;

The same approach is used for reading numbers.

The generality of this operating system interface should allow programs to be freely written which modify files, directories, devices, memory, and interact with the operating system in all possible ways.

7. Problems with the compiler

The compiler has some known problems. At this point they are felt to be minor inconveniences which are however difficult to fix, and so have not been.

  1. Packed structures -- Although the compiler recognizes the PACKED keyword, no code is generated to effect this declaration. All packing and unpacking must be done by the user by multiplication and division, if necessary.

  2. Multiple Word Assignments and Comparisons -- These are not compiled correctly. It is necessary for the user to write the appropriate loop to affect all elements of multi-word structures.

  3. Constant Subscripts and Record Offsets -- The compiler may generate offsets which exceed the -128..+127 range of allowable offsets in memory reference instructions. For example,

    X: array [0..512] of integer;
    X[512] := 0;

    will generate a

    <load zero into register r>
    LDA 2,X load address of array X into register 2
    STA r,512,X store zero into X[512]

    and the offset 512 is an assembly error. This can be avoided by,

    I := 512;
    X[I] := 0;

    This is necessary only if the constant offset is too large.

  4. Nested Function Calls -- A function call used as a parameter in another function call may mess up the parameter passing mechanism. For example,
    A := F1(1,2,F2(3,4),5);
    will be compiled as
    A := F1(3,4,F2(3,4),5);
    This can be corrected by a temporary variable and assignment, such as
    T := F2(3,4);
    A := F1(1,2,T,5);

8. Conclusions

The compiler has some obvious problems, but it is also, we hope, a useful tool for programming on the Nova. A new and better compiler will be written to correct the problems of this compiler. Your comments, suggestions, and experience will be very useful in the revisions of this compiler (and others).

It should be noted that the present compiler is written in Pascal. This allows modifications to be made to the existing system to correct problems of a severe nature, or to produce local user variants of the subset language. Such changes which may occur in the near future include changing the compiler call to automatically look on the command line and search for a .PA extension, produce a listing file, detect and react to errors in operating system I/O, or adding some missing syntactic structures (like the FOR statement, GO TOs, labels). Your suggestions are encouraged.

9. References

  1. Peterson, J. L., "A Compiler for a Pascal-like Language", Systems Programming Laboratory Note 1, Department of Computer Sciences, The University of Texas at Austin, (September, 1977).

  2. Peterson, J. L., "Code Generation for a Pascal Compiler for a Nova Computer", Systems Programming Laboratory Note 2, Department of Computer Sciences, The University of Texas at Austin, (September, 1977).

Figure 2. List of Error messages
error ( 2) (* symbol table overflow *);
error ( 3) (* found wrong token type *);
error ( 4) (* symbol not a type name *);
error ( 5) (* subrange type conflict *);
error ( 6) (* subscript not simple type *);
error ( 7) (* missing field name *);
error ( 8) (* constant for field name *);
error ( 9) (* delimiter missing in field list *);
error (10) (* type definition missing *);
error (11) (* period after nonrecord symbol *);
error (12) (* illegal field name *);
error (13) (* field name not found *);
error (14) (* subscripted var not array *);
error (15) (* subscript wrong type *);
error (16) (* looking for a variable name *);
error (17) (* too many actuals *);
error (18) (* paramter type mismatch *);
error (19) (* delimiter missing in list *);
error (20) (* too few actuals *);
error (21) (* looking for a factor *)
error (22) (* symbol is wrong kind *);
error (23) (* type conflict in term *);
error (24) (* type conflict in expression *);
error (26) (* uncomparable types *);
error (27) (* AND operand not boolean *);
error (28) (* OR operand not boolean *);
error (29) (* expression not boolean *);
error (30) (* mismatched assignment types *);
error (31) (* illegal symbol on left of assign *);
error (32) (* bad cons definition *);
error (33) (* missing parm name *);
error (34) (* constant for parm name *);
error (35) (* delimiter missing in parms *);
error (36) (* period missing after program *);
error (37) (* undefined symbol *);
error (38) (* multiply defined symbol *);
error (41) (* adding non-integer *);
error (42) (* subing non-integer *);
error (43) (* muling non-integer *);
error (44) (* diving non-integer *);
error (49) (* improper statement terminator *);
error (50) (* illegal genop call *);
error (51) (* subscripted non-array *);
error (206) (*  constants not compatible *);

Figure 3. List of Token types
  1  minus -
  2  divide /
  3  multiply *
  4  AND
  5  OR
  6  NOT
  7  left parenthesis (
  8  right parenthesis )
  9  left bracket [
 10  right bracket ]
 11  MOD
 12  DIV
 13  equal sign =
 14  not equal sign
 15  greater than >
 16  greater or equal >=
 17  less than <
 18  less or equal <=
 19  assignment  :=
 20  period .
 21  comma ,
 22  semicolon ;
 23  colon :
 24  IF
 25  THEN
 26  ELSE
 27  WHILE
 28  DO
 29  REPEAT
 30  UNTIL
 31  BEGIN
 32  END
 33  CONST
 34  TYPE
 35  VAR
 36  PROCEDURE
 37  FUNCTION
 38  ARRAY
 39  RECORD
 40  OF
 41  FILE
 42  PACKED
 43  EXTERN
 44  FORWARD
 45  plus +
 47  identifier
 48  string "..."