Syntax for Literature References




James L. Peterson




MIT Laboratory for Computer Sciences
545 Technology Square
Cambridge, MA 02139




September 1978





References in a published paper serve an important function in computer science. The field is large and growing rapidly. The proliferation of journals, conferences and sources of technical reports make it increasingly difficult for a researcher to stay abreast of all the work in a particular field, even relatively narrow ones. References serve as pointers into the computer science literature.

Because of the importance of references, it is important that the references be clear and easily understood. This is an editorial concern and represents a continuing problem for the major journals. Since ACM and IEEE publish the most important journals in computer science, they should take the lead in clearly defining the syntax of a reference. However, a check of the directions to authors for Communications of the ACM [14] and IEEE Transactions on Computers [12,15,16] reveals that only a few examples are given, with the expectation that the author will appropriately generalize as necessary. A survey of recent papers shows however, that this results in extremely inconsistent syntax in reference lists.

Therefore, in this note, I propose a standard for the syntax of references. I further urge the editors of both ACM and IEEE publications to either accept this proposal as a standard for their journals, or to jointly prepare and publicize an acceptable alternate standard.

You may argue that this is not an appropriate problem for computer scientists, that some other group, like librarians, should define such a standard. I would reply, firstly that they have not done so, nor secondly do they generally understand the variety of referenced material. Further, it is uniquely appropriate for computer science to define such a standard, for two reasons: (1) The work on formal languages provides us with the tools for defining such a standard, and (2) there is a desire to create machine readable bibliographies, requiring a well-defined syntax for bibliographic entries.

The specific proposal given below is the result solely of my own experiences in compiling reference lists and as such may well be unsuited for general use, (although I don't think so). I believe it has a certain logic and consistency which greatly simplify the problems of definition and extension. It is driven by the following goal:

In presenting the proposed standard, I use a modified BNF as proposed by Wirth [11].

The Proposed Standard

Bibliographies

A bibliography is a list of references, sorted alphabetically, with each reference numbered.

<bibliography> = { <number>. <reference>. }

The item of interest to us is of course the syntax of the <reference>.

A reference consists of three basic parts: an author (or list of authors), a title, and a source (or list of sources). These three fields are separated by commas.

  <bibliographic entry> = <author list>, <title>, <source list>

Author

An <author list> describes the authors of the referenced item and is generally a list of names. Three special cases can be identified:

Another possibility is that the reference is to an edited work, rather than to a work written by a set of authors. This is indicated by following the list of names by the parenthesized (Editor) or (Editors), indicating that the preceding names are editors rather than authors.

<author list> = <empty>
                     | <name> [ (Editor) ]
                     | <name> { , <name> } and <name> [ (Editors) ]

Names

The form of names offers extreme variability in current reference lists. We prefer to define a name by any known initials and a last name, in that order.

<name> = { <initial> . } <last name>

Title

The title is copied directly from the referenced work, and enclosed in quotes,

<title> = " <title of work> "

All principal words start with a capital letter. We see no reason to distinguish between different types of titles, such as titles of books versus titles of papers versus titles of reports. All titles have this one simple form.

Source

The source of the referenced item is the most critical part of a reference. Because of the diverse nature of the source of computer science literature, this is also the most complex part of our syntax. We can identify the following sources:

Journals

If a referenced article appears in a journal, then the reference has the following form,

<journal> = <journal title>, Volume <number>, [ Number <number> , ]
                       [ Part <number> , ]  <date>, [ <page numbers> ]

The <journal title> is underlined, or in an italic font, and unabbreviated. Similarly, the keywords Volume, Number, and Part are not abbreviated; this increases readability of the reference. Whenever possible, complete information providing both volume and number should be given. Some journals use Part instead of Number (as in [10]) while other journals may use both Number and Part (as in [5]).

Books

The source of a book is its publisher.

<book>  = <publisher name>, <date>, [ <number of pages> ]

An alternate form is to include both the name and city of the publisher, but the city information is increasingly useless. The front matter for [4], for example, gives four cities (Reading, Massachusetts; Menlo Park, California; London, England; Don Mills, Ontario) as the location of the publisher (Addison-Wesley). Which should be used? We assume that the purpose of the city information is to ease the ordering of a book from its publisher. However, if that is the case, mere city information is insufficient; a complete address is needed. Since publisher addresses are available in references such as [2] and [13], only the name of the publisher need be given in a reference.

Technical Reports and Theses

A reference to a technical report needs an identifying report name or number plus the issuing agency.

<report> = <report name>, <issuing agency>, <date>, [ <number of pages> ]

The <report name> can indicate a technical report, internal memo, thesis, or any of the multitude of names given to reports of research. The <issuing agency> field may indicate the address of the issuing agency if this is necessary to properly identify and locate the source of the referenced report (as in [9]).

Conference Papers

A reference to a conference paper should be a reference to a particular part of the conference proceedings. The conference proceedings itself has an editor, title and source, and is thus also a bibliographic entry.

<contained reference> = in <reference>, [ <page numbers> ]

This introduces recursion into our syntax. This same syntax is used to reference any item which is contained in another item, such as chapters of books (as in [1]). A reference to a conference paper is generally included in a proceedings. The proceedings are referenced as a book (as in [3] and [7]).

Other

A final category would include any other sources. One which comes to mind is private communications between researchers. It is not obvious that this is an appropriate reference since this material is not generally available to the research community as a whole, and hence should probably not be referenced in this way. A direct citation in the text would probably be better.

However, our syntax can easily be extended to include these forms of reference. Other sources might include the following key phrases.

<other> = <phrase>, <date>

<phrase> = private communications
                | available from the author
                | in preparation
                | submitted for publication
                | to appear in <journal title>

Source list

Given these definitions for sources, we can define a source to be:

<simple source> = <journal>
                         | <book>
                         | <report>
                         | <contained reference>
                         | <other>

Some items may appear in several different sources. Keeping in mind that a reference is meant to indicate where the referenced item is available, we should give all known sources, allowing the researcher to use whichever may be most convenient. Thus, a source list is used in a reference, not a simple source (as in [6]).

<source list> = <simple source> [ ; also <simple source> ]

Remaining Nonterminals

Most of the remaining nonterminals are easily defined. Dates should always appear in parentheses, being generally a month and a year.

<date> = ( [<month>] <year> )

Occasionally complete dates (month, day and year) or several months are needed (as in [5]), so the complete syntax of this item is lengthy, but not particularly interesting.

Pages are defined by either <number of pages> or <page numbers>; their syntax is,

<number of pages> = <number> pages

<page numbers> = page <number>
                         | pages <number> - <number>

<number> is almost any arbitrary string since technical report numbers, page numbers and so on may include nonnumeric characters (as in [8] and [9]).

Most of the remaining nonterminals are essentially arbitrary alphabetic strings.

Conclusions

It is possible to give a complete syntax for the form of a literature reference, and to define that syntax to make references clear and understandable. The adoption and publication of a standard form for references in ACM and IEEE publications would help eliminate some of the confusion and difficulty in research into the computer science literature. It is appropriate for both of these leading professional organizations to take steps towards adopting a proposed standard, similar to the one given here, or in clearly defining an alternate standard.

References