Messages, Ports, Guardians and Processes

James L. Peterson

DSG Note 25 MIT Laboratory for Computer Sciences
545 Technology Square
Cambridge, MA 02139

1 November 1978

Guardians have been defined as a local collection of processes and data. Hence we could expect a guardian definition to be similar to the following:

	guardian <heading>;
	<local data and data abstractions>;
		<process 1>;
		<process 2>;
		    ...
		<process 3>;
	end guardian;

From this definition, it is obvious that the active components of a guardian are its processes. Hence it is the processes which send and receive messages, and not the guardian. Therefore, we concentrate on messages, ports and processes in the remainder of this note.

Ports and Messages

The message system allows a process to send a message and another process to receive it. There are several ways that this is possible. One way is to send the message directly from one process (A) to the other process (B). In this case process A would execute a statement like:

	send to B message M;

Process B would include:

	declare X: process_name;
	...
	receive message M from X
	if X=A then ...

The major problem here is that the process A must send a message directly to another process requiring knowledge of B's name; it is also difficult to provide multiple channels of communication between processes.

In this scheme a message is sent to a passive message queueing station (the port), and stored in this system space until it is received by a process. Notice that by associating exactly one port with each process, the previous system of direct messages results. A direct message is sent to a specific process by sending to the (unique) port associated with that process. The more general port mechanism decouples the message queueing station (the port) from the process, introducing a new system concept and the overhead of creating and maintaining ports in exchange for the additional convenience in the structuring of applications. There is no added power in this approach since dummy processes can be created to simulate ports in the direct message system.

Ports introduce an extra level of indirection to the message system. A sending process no longer needs to know what process (or processes) may work on a message, only the name of the port to which the message should be sent.

	send message M to port P;

The receiver now has additional flexibility since it can receive messages from multiple ports.

	receive message M1 from port Q;
	...
	receive  message M2 from port P;

Similarly two (or more) processes may attempt to receive from the same port, allowing the existence of multiple consumers or servers for messages.

The port construct is a data structure of two lists: a list of messages which have been sent, but not yet received, and a list of processes which are attempting to receive messages (if there are no messages to be delivered). (Note that at least one of these lists is empty at any given time.) Five basic operations on the port construct are possible:

port$create: create a port with empty lists and return its name (see the discussion of port and message typing below).
port$send: if there is a process waiting for a message, copy the message to the waiting process, remove it from the list, and restart it; if no process is waiting, add this new message to the list of messages.
port$receive: if there is a message waiting, copy it, remove it from the list and continue; otherwise add this process to the list of waiting processes and wait.
port$empty: return true if no messages are waiting in the port, false if messages are waiting.
port$delete: delete the named port; all further references to this port will raise an exception: "nonexistent port".

The port$empty operation is included for completeness, but we suggest that it neither be implemented nor used, since it can lead to race conditions and busy waiting.

Messages are objects. As such, they have a type. Similarly, a port has an associated type. The type of a message in a send or receive must be included in the type of the port. This is in line with the strong-type nature of CLU and other modern languages. This leads to a need to declare the type of the port; this type is a parameter to the port$create operation.

	p:  port[string];
	q:  port[int];

	...

	p := port[string]$create();
	q := port[int]$create();

	...

	send "X01 TIME" to p;
	send 10 to q;

The use of the one-of construct allows multiple (discriminated) types of messages to be sent through a port.

The general model has no coupling between requests and replies. In many situations this coupling exists and should be explicitly recognized. However, we would contend that this coupling should not be part of the system level view of messages, but rather a language feature. A reply port is accomplished in a request by including the reply port definition in the request message.

	T = record
		m: string;
		r: port[char];
	    end;

	req: port[T] := port[T]$create();
	rep: port[char] := port[char]$create();

	send ["HELLO",rep] to req;
	receive s:char from rep;

and in the other process

	m: T;

	receive m from req;
	...
	send 'Y' to m.r;

Reply ports can be made easier to use by syntactic sugar which implicitly defines the request message as a record with a reply port part and a message part. Compatible sugar would be needed at both the request (and reply?) port definition and at the send and receive statements.

Multiple Waiting

It has been suggested that situations will arise in which a process needs to wait on several ports. Two questions arise in regard to this problem: first, how often (if at all) will this be needed and second, can this need be satisfied in the proposed system? We note that there are several possible forms of multiple waiting. One of these is the AND-multiple-wait, where a process needs to wait for a message from port A and a message from port B. This can easily be handled by two sequential receive statements (in any order).

A more difficult problem arises with OR-multiple waits. Once could conceive of two forms of this multiple wait: exclusive-OR and inclusive-OR. This distinction arises in the following construct which allows OR-multiple-waits in a system with only a single wait capability. To wait on either of two ports, A or B, we create two dummy processes, PA and PB, which simply receive a message (from A or B, respectively) tag it with its source and send it on to a new port AB. A receive on port AB now gets a source-tagged message from either A or B. The problem is that if two messages are sent, one to A and one to B, both of them will be moved to port AB, and hence both must be processed by the OR-waiting process (inclusive-OR-wait). This differs from the assumedly desired behavior of receiving a message from either A or B, but not affecting the other (exclusive-OR-wait). To achieve this effect, it would be necessary to receive the second message and "put it back" in the port from which it came. (And this would require knowing that there was a second message and race conditions would probably result.)

We note also that if it is obvious that the same process will receive two messages sent to different ports, then these messages perhaps should not be sent to different ports. One can conceive of a file process with two ports: READ and WRITE. The two ports are used to distinguish READ requests from WRITE requests, but if requests must be processed by the same file process, it would be preferable (?) to simply have one REQUEST port and explicitly tag each message as either a READ or WRITE request. This would eliminate the need to wait on two ports.

Only experience with real systems will tell us if an OR-multiple wait capability is needed. If it is, then the most reasonable scheme would seem to be the one suggested by Wayne Gramlich, based on his experience with Hydra. In this scheme, receives can be made from any member of a set of ports. The sets are dynamically defined and manipulated as a predefined type. A case statement would immediately follow the receive command to indicate the source port of the command:

	receive from port_set$[p,q,r];
	case of
		p,r (message m1): ...;
		q   (message m2): ...;
	end case;

The implementation of this approach can be quite complex since for every port we need a list of the processes waiting on that port. In addition, we need a list of all ports on which a given process is waiting, so that the process can be removed from their lists when a message is given to the process as a result of a message arriving at one of the ports. This can be quite complex. However, a simple restriction simplifies the problem considerably: a port can be in at most one port-set at any given time. This seems a reasonable restriction and allows the following implementation: Each port-set is an object. A port points to the port-set of which it is currently a member. The port-set has an associated list of processes waiting for messages. When a message arrives, the port follows the pointer to the port-set, picks a process off the list, and gives it the message; this process is no longer on the list of the port-set, or of the other ports.

A port may be added to or removed from a port set whenever there are no processes waiting for messages from it. It is not clear if this can be enforced at compile time, or if it requires runtime enforcement and the associated exception handling.

A lesser problem of some concern occurs when each of the several ports in a port set have a message when a receive is executed: which port is used? Possibilities include:

priority (if A and B both have messages, give me A's message first),
randomness (flip a bit, if 0 give me A, if 1 give me B).
alternation (keep track of the last decision and make the opposite one this time)
time (time stamp by send time and oldest message first)

Time-Outs

Time-outs have been suggested as one requirement for our system. The only reasonable place to attach a time-out would seem to be with a receive statement. Bluntly, this is the only place in which a process waits and therefore the only place an extra mechanism is needed to prevent death caused by the inactivity of another process. Although it is true that the waiting time may well be thought of as from the time of the send, (for a request/reply situation), it is not unreasonable to assume that the time from the send to the receive could be estimated or measured allowing the amount of time remaining to be calculated and used as the parameter for the time-out.

A more severe problem is the question of what kind of time should be be used for time-outs? Two kinds of time exist for a process: real time and apparent time. (Apparent time is often called cpu time.) If processes are multiprogrammed, these times are not the same, since real time is the sum of apparent times. If process A has sent a request to process B and is waiting for a reply, and expects B to compute 10 units, it would want to time out after 10 (plus epsilon) units of B's apparent time. I can seen no way to correctly express this, since the only times available to A are A's real time and apparent time (and since A is waiting, only A's real time is of interest). However, the relationship between A's real time and B's apparent time is workload dependent and highly variable (at the minimum). Perhaps the conclusion from this discussion is that time-out parameters must be extreme upper bounds, on the order of hours for simple tasks. It is suspected that in this case, if failures occur, this will be an unreasonable detection and recovery mechanism.

Consider the following example. Process A makes a request of Process B which among its own calculations makes a request to process C which also makes a request to process D. A normally gets an answer to its request within one second. D normally takes 50 milliseconds, but in the worst case (unstable matrix near the conversion point), 10 cpu seconds are needed. Also D is multiprogrammed and so may execute at only 10% normal speed under heavy workloads. So C can not time-out in less than 100 cpu seconds with any certainty of D being broken. In addition if C thinks D is broken (due to time-out), it retries 4 times before giving up. Thus, B must wait up to 500 seconds before it can correctly assume C is broken. B switches to a different algorithm if C does not respond, and requires a look-up on magnetic tape which may require (in the worst case) up to half an hour wait to mount the tape. Thus, A should not time-out any earlier than 40 minutes for a transaction which normally takes under a second.

Setting Up Communication

Another aspect of this port message system to be considered is initialization. If two arbitrary processes wish to communicate they must have a port in common. But if one party creates the port, how does it pass the name of that port to the other party? This problem can be solved by several techniques, the simplest uses the primal agents.

Ports will be predefined by the system for communication between the primal agents. Many of the basic application ports will be created by the primal agent of each node and passed to application processes as parameters. In most applications, this form of initialization time port creation will provide most of the static communication paths in the abstract network. Ports created later will be used mainly for conversations and replies and can be passed from one process to another by sending a message containing the reply port name over a pre-existing port.

An alternative in a general-purpose system in which ports will be dynamically created often is to define a well-known process which maintains a directory of process names, descriptions, and ports to be used to communicate with these processes. A user can then search the directory to select the process or service desired and find a port to use. The obvious analogy is to a telephone book.
Home Comments?