Home Up Questions?




Translating Pascal Into C




James L. Peterson




Information Technology Center
Carnegie-Mellon University
Pittsburgh, PA 15213
(On Leave from The University of Texas at Austin)




July 1984




This paper has been published. It should be cited as

James L. Peterson, ``Translating Pascal into C'', IEEE Software, (July 1984), page 82-86.


Introduction

Traditionally, I have been doing all of my programming in Pascal [Jensen and Wirth 1974], both because it is available on all of the computer systems that I use, and because I find Pascal to be the clearest language in which to write. However, I know of no Pascal compiler which has been specifically designed to generate high-quality object code. Most Pascal compilers stress other aspects of compiler design, such as error recovery or portability, or were academic projects.

C [Kernighan and Ritchie 1978], on the other hand, while not nearly so easy to read or understand, was designed, in part, to allow good object code generation [Ritchie, Johnson, Lesk and Kernighan 1978]. C compilers tend to be particularly good on Unix systems, where the operating system itself is written in C.

There is no intrinsic reason why the code generated by a C compiler would necessarily be better than the code produced by a Pascal compiler. However, in practice, we would expect the same algorithm, written in C and in Pascal, to execute faster in C than in Pascal. The C compiler will probably produce better object code than the Pascal compiler. A reasonable approach for production programs might then be to develop the program in Pascal, for readability and ease of debugging, and then translate the Pascal source into C source for repeated execution.

A Pascal to C Translator

Pascal and C are quite similar languages [Feuer and Gehani 1982]. Both are block-structured procedural languages. Pascal is mostly a subset of C, so translation from Pascal to C is relatively easy. C has a number of operators and structures which have no equivalent in Pascal (such as conditional expressions, or the break statement), so translation from C to Pascal would be considerably more difficult.

A reasonable text editor allows translation of a small program of a few hundred lines of code from Pascal to C in an hour or so. I translated one program which I use often from Pascal to C by hand. It was a program which selected fields from a 20 megabyte data file on a VAX for later processing. Running the Pascal version took about 50 minutes, while the C version of the same program took about 5 minutes. This lead me to believe that translating Pascal programs to C might lead to significantly faster processing, presumably because of the better C compiler.

In fact, a program can be written to translate Pascal to C (mostly) automatically. A design study was done by Nancy Springen [1982], and a first translator was then written by Gad Dafni, at the University of Texas at Austin. I have since rewritten nearly all of the translator. The translator was, of course, written in Pascal and successfully translates itself into C.

The translator uses a top-down recursive descent compilation method to recognize the input Pascal program. For each Pascal construct, the corresponding parsing routine generates an output string which is the equivalent C source code. Most Pascal constructs have a fairly obvious C equivalent, and so the conversion is straightforward.

For example, the Pascal

if boolean-expression
   then statement1
   else statement2

is translated to the C

if (boolean-expression)
        statement1;
   else statement2;

A major aim of the translation is to retain the original flavor of the Pascal program. Thus, although we could replace symbolic constants with their values, it seems more appropriate to retain them as symbolic constants in the C program. The output of the translator has the same structure, statements and variables as the input -- only the language (and syntax) of expression has been changed.

The translation from Pascal to C involves many different changes. The simplest changes are lexical substitutions; operators and keywords in Pascal are replaced by corresponding C operators and tokens, as shown in Table I. In addition, the Pascal keywords do, then, procedure, and function are all replaced with the null string. This lexical substitution is effectively the level of translation suggested by LaLonde and Pugh [1983].

Table I: Lexical Transformations
Pascal C


:= = (assignment)
= == (equality test)
<> != (not equal)
{, (* /* (begin comment)
}, *) */ (end comment)
^ * (pointer dereferencing)
begin {
end }
and &&
or ||
not !
mod %
div /
case switch
record struct
integer int
real float
input stdin
output stdout

The second level of translation involves syntactic transformations. Certain constructs in Pascal have equivalent constructs in C, but with differing orders of the tokens. A simple example is the declaration of variables. In Pascal, a variable declaration is of the form:

name: type;

The equivalent declaration in C reverses the order of the type and the variable name (as well as possibly performing lexical substitutions such as integer --> int) to create the form:

type name;

A third level of translation is context-dependent translation. In some cases, the equivalent C code for a Pascal fragment can only be selected with the help of contextual information. This is generally supplied by examining the symbol table. For example, parameters in Pascal can be passed by value or by reference (var). Parameters in C are always passed by value, but since it is easy to generate the address (&n) of a C variable (n), we generate either p(n) for a Pascal call-by-value, or p(&n) for a Pascal call-by-reference. The declaration of the parameter and its access in the C procedure must, of course, also take this context into consideration: call-by-value parameters are simply referenced as n, while call-by-reference parameters must be dereferenced as *n.

Similar contextual information must be used to translate statements which depend upon type information. For example, the Pascal write(i) statement may be translated to the C printf("%d",i) or printf("%c",i) or printf("%f",i) depending upon the type of i: integer, char, or real.

A fourth level of translation is the semantic expansion of some Pascal constructs into equivalent C constructs. These expansions involve those "higher-level" constructs in Pascal which must be translated down into lower-level constructs in C. For example, the with statement in Pascal has no obvious equivalent in C. There are two possible choices for its translation. One approach would be to remove the with statement completely by replacing all affected field names within the scope of the with with completely qualified names. Thus, the statement,

	with st[i].next[j]
	  do begin
			ref := ref - 1;
			if ref <= 0
			   then release(i,j);
	    end;

could be translated to the C,

	st[i].next[j].ref = st[i].next[j].ref - 1;
	if (st[i].next[j].ref <= 0) release(i,j);

However, this seems to violate the purpose of the with statement in Pascal. Accordingly, we suggest the following C code:

	{
	  register struct ... *ptr;
	  ptr = &st[i].next[j];
	  ptr->ref = ptr->ref - 1;
	  if (ptr->ref <= 0) release(i,j);
	}

This code creates a temporary pointer which is then used to access the fields of the specified record.

As would be expected with a powerful language, such as C, there are several possible translations for many Pascal constructs. The Pascal

repeat
	...
until boolean;

for example, must be semantically expanded, since there is no direct equivalent in C. It can be translated as either of the following C code segments (at least).

	do {
	      ...
	   } while (!(boolean));

	for (;;)
	  {
		...
		if (boolean) break;
	  }

Both of these translations are semantically correct. To choose between them, we fall back to our motivation in translating from Pascal to C. We wish to produce an equivalent program for production use which can be read, understood, and modified, just as the original Pascal program. This means that the C program should capture the structure and "feel" of the original Pascal program. In this case, our motivation argues for the do { ... } while form in C.

A similar situation arises with const and type definitions. A const definition

	const max = 10;

can be translated as

	#define max 10

or

	int max = 10;

While these are semantically equivalent, they differ in significant ways. The #define may result in compile-time optimizations; while the use of a run-time initialized variable would allow the programmer to modify the value of the "constant", either deliberately or accidentally. To preserve the closest match to the Pascal meaning, we choose the #define. Similarly, type definitions in Pascal are converted to the C typedef construct.

The use of #define and typedef on the other hand, introduces scope problems. The #define is processed by a macro preprocessor. Hence it has global scope. This can cause problems if the same name is redeclared in an inner scope in a Pascal program. Similarly, types defined by typedefs cannot be redeclared in C. Both of these problems are resolved by a consistent renaming of any names in the Pascal program which would cause scoping problems in C. Renaming is also used whenever a Pascal variable name would conflict with a C reserved word.

A final stage of the translation is the creation of the environment for the newly converted program. Some portions of the translation from Pascal to C are easiest to handle by library subroutines. For example, the eof and eoln functions of Pascal can be easily simulated in C by a subroutine or macro. Similarly, assignments or comparisons of large objects (such as arrays or records) are translated into a call on a library routine to copy or compare n bytes.

Finally, there are some Pascal constructs which require programmer intervention. For example, Pascal allows deeply nested procedures; C allows no nesting. It is possible, prehaps with renaming, to simply unnest any nested procedures. However, If there are references to non-global, non-local variables, recoding is necessary. This situation can be corrected either by passing the variables as parameters or declaring them global, but each has its disadvantages.

A similar situation exists with the logical operators in Pascal and C. The logical operators in C are conditional; in an expression like (a && b), the value of b is evaluated only if a is true. In Pascal, both a and b are always evaluated (at least in theory). This distinction is important only if b may have side-effects. On the assumption that this is rare, we ignore the distinction when translating from Pascal to C. A simple macro/function call could be used in C if it was crucial that both components be evaluated.

An Example and Comparison

Once the translator was complete, I tested it on a simple program. Number.p is a Pascal program which prepends a line number to the beginning of each line of its input as it is copied to its output.

program number (input, output);
 (* add line numbers to each line of an input file *)
var
	n: 0..999999;
	c: char;
begin
	n := 0;

	while not eof(input)
	   do begin
			n := n + 1;
			write (n:6,' ');

			while not eoln (input)
			   do begin
				read (c);
				write (c);
			      end;
			writeln;
			readln;
		end;
end.

If number.p is compiled by the Berkeley Pascal compiler on a SUN 1.5 workstation running 4.1c Berkeley Unix, and executed on a file of 148,151 bytes and 5861 lines, it takes 66.3 seconds of cpu time and 25,745 bytes of memory.

Translating this mechanically to C with our Pascal-to-C translator, gives the file number.c.

/* program number */
#include <stdio.h>
#include "pascal.h"
 /* add line numbers to each line of an input file */
int n;
char c;

main(argc,argv)
int argc;
char **argv;
{
   n = 0;
   while (!eof(stdin))
      {
         n = n + 1;
         fprintf(stdout,"%6d ",n);
         while (!eoln(stdin))
            {
               fscanf(stdin,"%c",&c);
               fprintf(stdout,"%c",c);
            }
         putc('\n',stdout);
         readln(stdin);
      }
}

Running this program on the same input, the C program took 132.0 seconds of cpu time and 22,444 bytes of memory. Thus, although we saved a little memory, the program took twice as long to execute!

This seems a little strange. Looking at the C program, the only coding which seems "strange" is the reading and writing of the characters within each line. The Pascal read(c) was translated into the equivalent fscanf(stdin,"%c",&c) in C. Similarly the write(c) in Pascal was translated to fprintf(stdout,"%c",c) in C. Reading and writing a single character is a common special case however and C provides the more direct getc and putc for this special case. An easy pattern matching can change all such reads and writes of a single character into the getc and putc forms. A particular advantage of these special forms is that they are implemented by in-line macro expansion. This substitution provides the modified number.c (Version 2):

/* program number */
#include <stdio.h>
#include "pascal.h"
 /* add line numbers to each line of an input file */
int n;
char c;

main(argc,argv)
int argc;
char **argv;
{
   n = 0;
   while (!eof(stdin))
      {
         n = n + 1;
         fprintf(stdout,"%6d ",n);
         while (!eoln(stdin))
            {
               c = getc(stdin);
               putc(c,stdout);
            }
         putc('\n',stdout);
         readln(stdin);
      }
}

This version of number.c took only 18.3 seconds and 18,052 bytes of memory.

Additional editting to bring the program into the more common flavor of C programs produces another version of number.c (Version 3) without the calls to the Pascal eof and eoln.

/* program number */
#include <stdio.h>
#include "pascal.h"
 /* add line numbers to each line of an input file */
int n;
char c;

main(argc,argv)
int argc;
char **argv;
{
   n = 0;
   while ((c=getchar())!=EOF)
      {
         n = n + 1;
         fprintf(stdout,"%6d ",n);
         while (c!='\n')
            {
	       putchar(c);
	       c = getchar();
            }
         putchar('\n');
      }
}

This version took only 16.3 seconds of execution time and 18,013 bytes of memory.

The main advantage in execution time then seems to come from the implementation of character I/O by macros, rather than formatting procedures. In an I/O-bound program, C allows substantial reduction in the run-time overhead from I/O, and hence improved total execution time.

Testing the same process on a simple cpu-bound program showed that the Pascal version required 5.9 seconds of cpu time, while the C version required 4.6 seconds, a 22% speed-up.

Another comparison can be obtained from the Pascal-to-C translator itself. The translator has translated itself from Pascal to C. Both source versions are about 6000 lines of code, but the C version requires only 59K of object code space, while the Pascal version requires 99K. When the translator converts itself from Pascal to C, the Pascal-compiled version takes 527 cpu seconds, while the C-compiled version takes 408 cpu seconds (77%). In addition, the C version can be modified to change simple procedures into in-line macros, eliminating the procedure call overhead. This can further reduce execution time, although at the cost of some increase in space.

Summary

I believe that most C compilers probably produce better object code than most Pascal compilers. My tests so far confirm this. Even more important though, for many programs, the C input/output system, with in-line macro substitution for character input/output, can be substantially more efficient for character-oriented I/O-bound programs. Thus, I expect both smaller and faster programs by converting my library of Pascal programs to C. The Pascal-to-C translator will make this translation much easier.

In addition, I am considering translating several large programs, such as a spelling corrector [Peterson 1980] and a text formatter into C, to improve running time and lower memory requirements.

Similar programs could also be written to translate other languages, such as Fortran or Algol, into C, or Pascal into Ada.

References

Home   Comments?