Notes on a Workshop on Distributed Computing

James L. Peterson

MIT Laboratory for Computer Sciences
545 Technology Square
Cambridge, MA 02139

12 - 13 October 1978

Background

A workshop on distributed computing was held 12-13 October 1978 at the Harvard Faculty Club in Cambridge, Massachusetts. The goal of the workshop was to encourage discussion, investigation, and exploration of the fundamental problems of distributed computing. The workshop was sponsored by the Laboratory for Computer Science of M.I.T., and organized by Barbara Liskov and Liba Svobodova, of M.I.T. Approximately twenty-five computer scientists with active interest in and experience with the area of distributed computing attended the workshop. The Appendix lists the attendees.

The workshop was, by design, highly informal. To start the workshop, Dave Clark presented the basic concepts being developed by the Distributed Systems Group (DSG) of the Laboratory for Computer Science. This produced a number of comments, questions, and arguments which were then used to provide a loose framework for the remainder of the workshop. The remaining sessions consisted of informal extemporaneous presentations and discussions as various issues were raised and considered by the group. Figure 1 lists the session structure.

Figure 1: Workshop on Distributed Computing - Session Content

Session I. Introductory Comments (Liskov)
DSG Approach to Distributed Computing (Clark)
Session II. Introduction of participants and their interests/backgrounds
Discussion of structure of workshop
Session III. Assumptions and Parameters (Redell, Popek)
Session IV. Transactions (Gray) Messages versus procedure calls (Sturgis)
Session V. Naming in Distributed Systems (Reed)
Transparency
Session VI. Distributed Jobs (Low)
Reliability (Svobodova)
Session VII. Recoverability (Shrivastava)
StarOS (Jones)
Session VIII. Cheap Processes and Messages (Hewitt, Gray)
Data Flow Semantics (Dennis)

In this note, we report on the issues that were discussed at the workshop. The field of distributed computing is still very much in its infancy, and as such, much of the discussion at the workshop concerned the problems which have been encountered or are envisioned; definitive solutions to these problems are not yet available, but rather several approaches to solutions are possible.

Introduction

Clark's remarks were intended to consider why distributed systems exist (or will exist), and from these considerations, to deduce some of the constraints on the resulting structures and languages for distributed computing. Clark established first of all that the interest of DSG is in information intensive applications with naturally distributed information, as opposed to distributed computing for improved performance (such as off-loading).

One very strong constraint on these systems is the need for autonomy. The distributed components of a distributed system should be autonomous. This includes the ability to remove a machine from the network (although this ability is needed for reliability anyway), but more generally, it means that the internal algorithms and organization of information can be freely selected at each node, independently of other nodes in the network. Decisions about what information is to be kept, how it is to be organized, how it is to be processed, and for what purposes it may be used are all to be locally decided.

The requirement of autonomy must be balanced against the need for coherence. A distributed system should not be distributed anarchy. Thus, certain conventions and protocols must be maintained, but only where two nodes interact, and where a choice exists between autonomy and coherence, Clark would give the choice to autonomy.

The system under development by DSG at M.I.T. does not allow information to be moved between nodes dynamically by the system, but maintains that the location and movement of information is to be controlled by the application developers. Protection is needed only between separate nodes, but not necessarily within a node. This implies that local data is inherently different from remote data, and this difference will show through to the application builder.

Clark went on to describe briefly the specific system under development at M.I.T.. This system is described in the 1978 progress report of DSG [Clark, et. al., 1978], available from the authors.

Several points were made during Clark's presentation which elicited comment and debate. One such topic was: Why a distributed system? The comments on this question included:

Growth. A distributed system is easier to expand as needed. This is in contrast to other systems which cannot expand except by replication. Replication however produces two distinct systems, rather than one larger system.
Reliability. Distributed systems potentially can be more reliable due to their multiplicity of parts.
Structure. Distributed systems can reflect the organizational structure which they serve.
Protection. Distributed systems can provide more security than a centralized system.

This last point (protection) was hotly debated and produced the statement that security results from a logically distributed system, and not from physical distribution. It was argued however that physically distributed systems, where your part resides at your site, provide more apparent protection of information by providing physical control over that part of the system; information can flow in and out only over easily identified wires. This produces "warm feelings" in the user of the system. This explicit control over local processing and information is presumably the major reason for physically distributed systems.

However, it was generally recognized that physical distribution is not as important as logical distribution. This logical distribution could be implemented on one processor, on several processors at one site, or at several processors at different sites. The physical separation of processors affects mainly assumptions about bandwidth and response time. It was suggested that these parameters might greatly influence how a distributed system is organized.

System Parameters

Redell and Popek produced the following list of system parameters and assumptions. The discussion of these topics indicated that there is a very wide spectrum in distributed systems, and this may result in vastly different logical structures.

a. Bandwidth/delay - from memory speed to disk speed to dial-up connections.
b. Degree of node autonomy - a communications-oriented free-standing operating system in each computer versus an integrated system controlling all processors.
c. Reliability - how important is it and how is it provided, particularly at what cost and visibility to the programmer.
d. Shared Memory - this is a separate issue from a Bandwidth/delay since data abstractions can provide or hide this in the logical structure with or without physically shared memory.
e. Relation between the node size and the task size - the activities of the system are broken down into separately schedulable units of execution (tasks). The physical network node must be large enough to execute at least one of these tasks. If the node is large enough to execute several, multiprogramming may be necessary. Alternatively, there may be a variety of node sizes so that some tasks can only be run on one or a few nodes because of size (such as for compilation), or the provision of special services (such as driving special devices).

A related issue is the question of the cost of processes and messages. Most approaches to distributed systems are based on the concept of multiple processes communicating by exchanging messages. The feasibility of this approach is largely dependent upon having cheap processes and cheap messages.

Gray argued that, currently, neither of these are cheap. Processes, at least on IBM systems, are very big, and expensive. A process has memory, registers, protection information, virtual memory information, accounting, timing, files, and so on. Thus a process switch can be very expensive. The growth of operating system services and features tends to make processes even more expensive by associating more and more state information with each process. Cheap processes would seem to require minimal operating system support (requiring minimal state information).

Messages are a similar problem. Message passing appears to be very expensive. Although it was difficult to establish uniform definitions of what was being measured, a time of about 20 milliseconds was quoted as the round trip time to send a (null) message and receive an answer on the Xerox Alto systems, with similar numbers put forth for IBM systems and Multics. This is both a surprisingly high and surprisingly uniform time. Sturgis indicated that, on the Xerox system, several recoding efforts had failed to significantly change this number. Still no one was able to indicate exactly where the time went, or why 20 ms should be a universal lower bound on message passing time.

Messages versus Procedures

A related topic, brought up by Sturgis, was a recent paper by Lauer and Needham presented at IRIA [Lauer and Needham, 1978] The Lauer and Needham paper argued that, for real (single processor) systems, message passing and procedure calling are essentially equivalent, and hence either construct can be used for the basic implementation of an operating system. This provoked a heated argument about the extension of this duality to distributed systems. The suggestion was that communication should be by a remote procedure call rather than sending a message and waiting for a return. This reduces the semantics at the programmers' level to only the semantics of the procedure call, simplifying the programmers' view by eliminating the need for messages.

One complaint was that treating message passing as procedure calling forces every request (procedure call) to have a reply (procedure return). It was argued that there exist systems which are structured as a pipeline, where all information flows in one direction only and this request/reply model is inappropriate.

Also, it was argued that remote procedure calls or messages are semantically different from local procedure calls. For local procedure calls, the responses from a call are of two types:

(a) The procedure completed successfully, and here is the answer.
(b) Some error definitely occurred; no answer is available, but here is the type of error encountered.

In either case, positive knowledge of the result of the procedure call exists. Even in cases of an infinite loop in the procedure, a timer can be set and a result of class (b) occurs.

For remote procedure calls or messages, these two types of responses to a request still occur, but another result is possible:

In this case, it is unknown if the request message was lost (procedure call not made), the server failed but was unable to respond so, or the server completed but the reply message was lost (procedure return not made). It was claimed that this third result class is a semantic difference between local and remote procedure calls.

Transparency, Naming, and Binding

The differences cited above for local and remote procedure calls hold for any remote reference across the net. Thus it was argued that remote references (messages) are fundamentally different from local references. This raises the problem of transparency: should accesses to remote objects appear differently to the programmer from accesses to local objects? While there appeared to be strong feelings on this subject, no agreement was reached. It was agreed only that this problem is tied to two other problems: naming and binding.

Within the system, objects (data and processes) need names. However, there may be several different levels of object names including:

local addresses
system wide addresses
routing information
complete path names
unique (system wide) identifiers
symbolic names

It is important to realize that these names are potentially quite different. Failure to appreciate these differences can lead to considerable confusion. Any conversation about naming must maintain a consistent meaning for the concept of a name.

Objects will generally be referred to by unique system wide identifiers (uid). This uid will be bound to some physical address to allow access to the physical object. An important aspect of this binding is the frequency of binding, and the set of actions which can occur between a binding and the subsequent unbinding. If the object cannot be moved after it is bound to a uid, then the uid can contain location and routing information. On the other hand, if the object can be moved, then the uid must be location independent, and local and remote accesses must be transparent (since the local or remote nature of the access cannot be determined).

Binding time can be delayed by the introduction of indirection, with a resulting loss of efficiency. Also, transparency can be provided by an additional level of user software, if it is not supported by the system.

Much of the group seemed to support the view that data objects exist within (data abstraction) modules providing a limited set of operations on the objects. Object names would consist of an (module name, local name) pair. Objects could only be manipulated by presenting the module with the object name; the module would use the local name to identify the particular object. Of course, this view rests on the crucial assumpution that a data abstraction is implemented entirely in one node -- an assumption that some of the group was unwilling to make.

Naming and transparency also affect reliability and robustness. If objects cannot move and locations are not transparent, then these objects will become inaccessible if the processor or links needed for these objects fail; the system cannot reconfigure itself to recover from a failure.

Reliability and Recovery

The topics of reliability and recovery were considered by Svobodova and Shrivastava.

Svobodova discussed the multiple update problem for copies of an object from two points of view, depending upon the reason for providing the copies. Copies can be maintained for reliability: a master plus k back-ups. In this case, all updates can be funneled through the master into the sequential back-up-copy managers. In the other case, multiple copies are provided to increase efficiency of access: accessing the closest copy will reduce communication delays while the existence of multiple copies will allow concurrent access increasing bandwidth. In this case, it is claimed, most programs can accept a copy which is slightly out-of-date in exchange for the faster access; a program which requires the most current copy can always access the master file. Changes to the copies need be made only within a reasonable time after the master is changed.

Shrivastava talked about the recovery of concurrent processes based on three techniques:

Information flow constraints -- a sphere of control or fire wall is created. If an error is found in the sphere, it is restarted. At the end of the sphere, all changes are committed. If an error is found after exit from the sphere of control, then post-committal recovery and compensation must occur.
Information flow analysis -- the messages between processes are logged. When an error is discovered, this log is examined and all affected processes are backed-up, using checkpoints to allow the system to restart. This is the approach considered in [Russell, 1976] and being implemented by Popek at UCLA.
Information validation -- assumes that the information sent to a process is valid, so only the processing by the receiving process is incorrect. Recovery therefore involves only the receiver and not the sender.

Transactions, a reliability concept similar to spheres of control, was advanced by Gray. A transaction is semantically atomic, it either happens (commits) or does not happen (aborts). The important point is that messages and data cannot be lost or partially changed from the point of view of the application programmer. (The case (c) in the discussion of procedures/messages above is not allowed). From the programmer's point of view a transaction might look like:

	begin transaction
		Send ...
		Receive ...
		Send ...
		Send ...
		...
	end transaction (commit or abort)

In a transaction either everything happens or nothing happens. Thus the transaction becomes a unit of recovery.

Real Systems

Three "real" systems were discussed to illustrate problems with distributed systems.

IBM's CICS/ISC system implements distributed transaction processing. CICS/ISC is a distributed data management system which allows transactions to act on "non-local" objects (databases, queues, or terminals) by translating remote procedure calls into messages to remote servers. Key aspects of CICS/ISC are:

Remote servers (agents) are automatically created for a transaction as needed.
Non-local actions are translated into messages to remote agents which in turn execute the action and convert the response into a message. The application program cannot distinguish "local" and "remote" procedure calls.
The two-phase commit protocol is used to coordinate the completion of all agents of a transaction.

Although it is attractive to assume that the job with sole responsibility for retaining certain data or performing some processing is wholly contained in one physical node of the distributed system, that assumption cannot always be made. Rovner and Jones discussed different aspects of the distributed job issue. Their remarks were based on their respective systems which are in an advanced state of development.

Rovner focused on the issue of restarting or continuing a job after recovering from an error. A job is defined as a collection of processes communicating via messages. In the University of Rochester system each distributed job has one controlling process which is responsible for handling errors encountered by all processes in the job. A registry of notification is used to specify which process should be notified in case of process or physical node failure. In addition, each process name includes an invocation number. If a node should fail and be brought up, then the servers implemented by the node are restarted. Their invocation numbers will differ from their previous invocation numbers and so the requesting processes can be notified of the server failure. They may then respond appropriately.

Jones defined task forces as a collection of executing processes that have a single high level objective to fulfill. She gave several examples of a language used to define task forces, so that certain parameters of a task force configuration could be varied to adjust to hardware variation due to failure and repair and to meet objectives of enhanced performance or reliability. Dimensions of task force variation include replication of data and processes, partitioning of data into separately addressable data structures, placement of information in specific memory modules, and process-to-processor assignment. The task force language was developed in conjunction with the StarOS operating system for the Cm^* computer system.

Conclusions

The workshop raised many issues: basic assumptions, costs of processes and messages, models of communication, naming, binding, transparency, reliability and recoverability. All of these fundamental issues must be decided in order to construct a distributed system. It appears that these problems can be solved in a variety of ways and the effects of these decisions are not yet known. Even when the basic system is designed, however, its use poses even more problems.

Discussion at the workshop was energetic. It was apparent that the group did not share a single paradigm of what a distributed system is. The underlying assumptions and objectives of the group members varied, leading them to state problems in different terms and to defend different solution approaches quite staunchly.

Thus, we conclude that further research is needed on distributed computing, both in the design of distributed systems and in the use of distributed systems.

James Gray, Anita Jones, Barbara Liskov and Liba Svobodova have lent their assistance to this report.

Bibliography

L. A. Bjork, "Recovery Scenario for a DB/DC System", Proceedings of ACM National Conference, (August 1973), pages 142-146.
D. Clark, I. Grief, B. Liskov, and L. Svobodova, "Semantics of Distributed Computing", Progress Report of the Distributed Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, (1978).
C. T. Davies, "Recovery Semantics for a DB/DC System", Proceedings of ACM National Conference, (August 1973), pages 136-141.
J. Gray, "Notes on Data Base Operating Systems", in Operating Systems: An Advanced Course, Lecture Notes in Computer Science, Volume 60, Springer-Verlag, (1978), pages 393-481.
H. C. Lauer and R. M. Needham, "On the Duality of Operating System Structures", Second International Symposium on Operating Systems, IRIA, (October 1978).
B. Randell, P. A. Lee and P. C. Treleaven, "Reliability Issues in Computing System Design", Computing Surveys, Volume 10, Number 2, (June 1978), pages 123-165.
D. Russell, "State Restoration Among Communicating Processes", Ph.D. Thesis, Department of Computer Science, Stanford University, (1976).

Appendix

Workshop on Distributed Computing
Participant List

Robert Chansler
Carnegie-Mellon University
Pittsburgh, PA

David Clark
M.I.T. Laboratory for Computer Science
Cambridge, MA

F. J. Corbato
M.I.T. Laboratory for Computer Science
Cambridge, MA

Jack Dennis
M.I.T. Laboratory for Computer Science
Cambridge, MA

James Gray
IBM Research
San Jose, CA

Irene Greif
M.I.T. Laboratory for Computer Science
Cambridge, MA

Carl Hewitt
M.I.T. Laboratory for Computer Science
Cambridge, MA

Anita Jones
Carnegie-Mellon University
Pittsburgh, PA

Barbara Liskov
M.I.T. Laboratory for Computer Science
Cambridge, MA

James Low
University of Rochester
Rochester, NY

Allen Luniewski
M.I.T. Laboratory for Computer Science
Cambridge, MA

James Peterson
University of Texas
Austin, TX

Gerald Popek
University of California
Los Angeles, CA

David Redell
Xerox System Development Division
Palo Alto, CA

David Reed
M.I.T. Laboratory for Computer Science
Cambridge, MA

Paul Rovner
University of Rochester
Rochester, NY

Jerome Saltzer
M.I.T. Laboratory for Computer Science
Cambridge, MA

Craig Schaffert
M.I.T. Laboratory for Computer Science
Cambridge, MA

Santosh Shrivastava
University of Newcastle Upon Tyne
England

Howard Sturgis
Xerox Research
Palo Alto, CA

Liba Svobodova
M.I.T. Laboratory for Computer Science
Cambridge, MA

Stephen Ward
M.I.T. Laboratory for Computer Science
Cambridge, MA

Home

Comments?