A Different View of Operating Systems

James L. Peterson

Department of Computer Sciences
The University of Texas at Austin
Austin, Texas 78712

1. Introduction

As usage of computers has increased over the past few decades, so has the size and complexity of the computer system. This growth in ability has occurred in both the hardware and the software of the computer. In particular, operating systems have developed which transform a set of diverse hardware components into a usable computing system. Operating systems provide the users of the system with a convenient environment for the conception, programming, execution and modification of solutions to their computing problems. The crude batch processing of early operating systems has evolved into the multiprogrammed and multiprocessor systems of today. These facilities are generally associated with large computer systems, but even midi- and mini- computer systems are attempting to provide time-sharing and multiprocessing features.

As the services provided by operating systems have increased, both in number and sophistication, so has the size and complexity of the operating systems. And with the growth in size and complexity has been an increase in the cost of system overhead, and a decrease in the reliability, flexibility, and robustness of the systems. This seems to have been primarily the result of attempts to extend the same techniques which had been used for the earlier simple systems to the later, larger, more complex systems by simply employing more people and more code to create the systems. It became obvious however that new techniques and concepts for the design and implementation of operating systems are necessary if large sophisticated systems are to be reliably created. This has resulted in much ongoing research into programming techniques, design methodologies and concepts for operating systems.

1.1. Processes

One concept which seems to have gained widespread acceptance is the concept of a process. The process is fundamental to the design of some current operating systems, much operating system theory and research, and several textbooks on operating systems. The definition of a process varies widely from precise automata theoretic notions to vague descriptions in terms of (virtual) processors and computations. Still despite the vagueness and variety of definitions among various sources, a common intuitive notion underlies most of this work. The process concept provides a convenient conceptual tool for abstracting the computing and resource requirements and properties of the sequential portions of a computation into identifiable, manipulatable entities.

The entire operating system can be designed as a set of asynchronous cooperating sequential processes. With this approach, the design and construction of an operating system can be readily decomposed into two separate problems: the design of the operating system kernal, and the design of the "user visible" system processes. The latter processes include the language processors (compilers, interpreters, assemblers), the file system, the job control language interpreter, the utility routines, etc., while the former involves the specification of interprocess control, communication, and synchronization as well as basic resource allocation. An operating system kernal provides the basic environment within which all processes exist and perform their programmed functions.

1.2. The Kernal

The kernal should be kept as small and simple as possible in order to avoid expending valuable system resources for overhead. Also a limited size and complexity would increase the reliability and probability of correctness, as well as the understandability and maintainability of the basic system. The kernal should provide a number of facilities, among them an interrupt handler, device drivers, a CPU multiplexer, operations for the creation, activation, de-activation, and destruction of processes, an interprocess communication facility, memory management, and some form of protection. It would normally not perform complex or variable functions such as job scheduling, control language interpretation, input/output control and scheduling, or supervisor requests. Basically, the kernal is a mechanism for transforming a given hardware configuration into a set of cooperating, asynchronous, sequential processes.

This still leaves a great deal of work to be done in the design of the kernal. The structure and organization of the environment within which the processes exist will have a profound effect on what types of operating systems may be built from them. For example, the expense of creating a new process may determine whether there are a fixed number of system and user processes, in practice, or whether a user may spawn additional processes freely and quickly. In the extremes, there may be only one monolithic system process or a system in which a user could economically spawn additional processes to solve minor problems in parallel. There are also many variations possible in the other elements of the kernal.

However, the problems involved in the construction of the operating system kernal (CPU scheduling, interrupt handling, memory management and so on), are basically well-known problems. These problems have formed the basis of operating systems research for many years, and although research is continuing, reasonable solutions are known. The design problem for the kernal is mainly a problem of selecting from the many published studies, those algorithms and techniques which best fit the particular machines and systems to be developed. The design and implementation of the kernal thus seem more of a development problem than a research problem. This is reflected by the emerging trend to move kernal functions into the hardware, or firmware, of new computer systems.

A major unsolved problem for operating systems research is the design and organization of those system processes which are supported by the kernal. The kernal allows the creation of multiple, concurrent parallel processes; the problem remains to put them to effective use. This problem can be considered as the design and control of a community of processes. From this point of view, the operating system acts as a government, providing certain basic services, guaranteeing certain basic rights, and enforcing certain basic rules of conduct, for the long-term benefit of the community as a whole. This anthropomorphic point of view can be very helpful in the creation of new system designs by allowing the relatively abstract system design problems to be cast as more familiar problems of social and political systems.

To the extent that societies and governments exist and continue to function, the problems of coordinating various asynchronous components have been solved. Social entities such as business corporations, educational institutions, and governmental bodies have designed and implemented existing systems for providing services (of at least a limited nature) to their users. Designers of computer systems may be able to adapt the solutions in these systems to design problems in operating systems.

In this paper, we present, as food for thought, some analogies between process behavior in a computer system and human or corporate social behavior. We examine, in particular, the area of operating system process management and resource allocation. The problem of process management, dealing with the creation, deletion and interaction of processes is perhaps the easiest to compare with similar human behavior. Allocation of resources to processes is somewhat less comparable, but some new ideas are still possible.

It is to be stressed that a great deal of caution must be used whenever analogies between computer systems and other systems are drawn. Despite some similarities, basic differences in such characteristics as size, speed, number of processing units, memory size, predictability, and cost may be sufficient to nullify any reasonable analogy. We suggest only that some reasonable analogies can be drawn between operating systems based on the concept of cooperating processes, and other organizations based upon asynchronously interacting cooperating entities (like people), and that these analogies should be considered for possible design ideas. We do not claim that solutions to one set of problems will necessarily be good solutions (or even solutions) to analogous sets of problems in another system.

In addition to the design problems for operating systems are the subordinate problems of implementation. What we are considering now is the basic philosophy behind the interaction of processes within a computer system. An effective, efficient implementation of the system design resulting from anthropomorphic analogies may or may not be possible. We will not consider possible implementation problems here.

We begin by examining problems of process management.

2. Process Management

Perhaps the most fundamental design considerations for a system of processes are the problems of interprocess structure and organization. The problem here is to pick an appropriate organizational structure to allow effective solution of a problem. This requires defining the number of processes, their roles, and the lines of communication, and is similar to the problems faced when a new branch of government, corporation, division, or department is created as a social organization.

There are two central, interrelated problems here: responsibility relationships and communication channels. Responsibility relationships define which processes are subordinate to which other processes and establish decision-making and supervision responsibilities. The most common responsibility structure is undoubtably a hierarchy. The hierarchical structure is sued by many businesses and governments. This provides a familiar and easily understood set of interprocess responsibility relationships, since every process is a subprocess of some other process. This structure may be too restrictive due to the problems of sharing subprocesses.

Another possible structure is the system of totally independent processes. Other possible structures could be constructed as combinations of these systems, creating analogies to state or local governments.

A separate, but related, problem is the problem of process communication. The system may be designed to limit the flow of information between processes. For example, in a hierarchical system, communication may be limited to strictly between a subordinate process and its superior process. This would allow a department head to completely control the flow of information from its entire subtree. Alternatively, the communication structure may be independent of the responsibility structure, allowing free communication between any pair of processes, or some other structured communication scheme.

Interprocess communication can be one of the crucial design problems of an operating system. The flow of information between processes can be very important to the successful design of a system. An interesting perspective can be obtained from human society. The most obvious conclusion is that a wide variety of communication mechanisms may be necessary. For specific problems, any of the several forms may be most appropriate. These include conversations (short messages alternately between processes), mail (longer messages in one direction), freight delivery (for large objects), and broadcast information (from a source to many users).

Another important aspect of process control is process creation. If a system is to consist of multiple processes, these processes must be created and initialized in some way. The simplest way is, of course, for them to be created at design time by the system designers. When the system starts, all existing processes are also brought to life, and each begins its pre-programmed task. With this method, the cost of creating a new process is nearly prohibitive. One must redesign and replace the old system by new one, with a possibly severe disruption of service.

More recent systems have been designed with the ability to create other processes as a part of the normal (system) instruction set. Problems then arise of a reasonable cost for creating a new process and limitations on he ability to do so. The cost of creating a new process will make a considerable difference in the uses to which they are put. If the cost in time to create a process is approximately the same as the expected lifetime of the process, very few additional processes will be created. On the other hand, if the time to create is small, relative to the expected lifetime of the process, separate processes may be created for minor tasks Cost should not be limited simply to time, however, but should obviously include such factors as space, complexity, system degradation, and accounting system charges.

As a possible indication of the order of magnitude which would be a reasonable cost, one can consider that 9 months is approximately 1% of the expected human life-time. Alternatively, 18 years is closer to 25%.

The attributes and responsibilities of a process at time of birth should also be considered. One can consider that processes are created as full-fledged members of the computing community or, alternatively, as "minors" who must develop and evolve before being allowed the full power of the machine. This would insure limitations on the abilities of undebugged "adolescent" processes. As the process "matures", it would be allowed to request and receive additional resources and be expected to interact on a more sophisticated level.

The idea of the development of a process over time brings up the possibility of a number of other anthropomorphic concepts. For example, many systems now require that its composite processes be without error. Any indication of a different situation may cause the death of the entire system but will at least eliminate the erring process. Perhaps, systems should be designed which merely classify such processes as insane or ill. Suffering from a disease which causes the process to be unable to act, or act correctly, these processes could be referred to other processes for diagnoses of the error(s) and corrective action.

Corrective action may include a restricted diet (the process may not be able to handle certain inputs), corrective surgery including amputation (elimination of poor code), transplants (replacement of incorrect or damaged subroutines or procedures), or implants (adding code or data structures). Prehaps only medicine would be needed (special inputs to reset internal variables and states). Such techniques may be quite useful for automatic error recovery and could extend the expected lifetime of the process considerably. As with people, such problems would be most likely to occur while the process is young and still relatively undebugged. As the process becomes older, health problems would probably become less common but prehaps much more severe, and the illness may be fatal.

Hopefully, the majority of the processes will not disappear from the machine due to a previously undiscovered bug in it program which finally proved fatal but would either be retired from the system, or judged senile. In either case, the process has served its purpose, and thus had a (hopefully long) useful career, but as time passes, the environment of the process changes and the process may find that there is little use for the services which it can provide. This may occur due to a possibly transient nature of the service which it provided, or a new process may be replacing it. This occurs in contemporary systems when, say, a new version of a compiler is released and the old version is retired, but not removed from the system until the new version is accepted. Those processes which are being phased out can be placed on a device whose access time is comparatively long in a form of a retirement home for old processes. At a later time, they could be completely removed from the system.

This latter event might be considered the death of the process. The death of a process is not a concept which has been carefully considered in many systems. There are a wide range of current response to the death of a process; the system may die also, the dead process may be removed from the system, or its death may go completely unnoticed. It is important here to be able to distinguish between the ill process and the dead process. The ill process may be able to recover with proper help, while the dead process either cannot be saved, or medical treatment is not economical, or it has been condemned to death by some other processes.

In analogy to a death in an anthropomorphic system, many steps can be taken when a process dies. An autopsy or post-mortem examination may be useful to pinpoint the cause of death. Pending the results of this investigation, the process could be moved to a special structure (a morgue) until it can be disposed of properly. One may need to consider the question of a notification of next-of-kin in some hierarchical systems. In any event, there may still be processes which are trying to communicate with the process through the systems message facility; they should be notified of the death of the process.

Disposition of the effects (resources) of the process should also be carefully considered. Should all of a process's property (i.e., memory, I/O resources, etc.) be returned to the state (i.e., the operating system) or can a process leave a will specifying what should be done with its estate. Also in certain structures of processes, particularly hierarchical systems, one may have to worry about the dependents of a process. Should they be killed when their guardian dies, or should they be allowed to continue (and if so under whose care)? Prehaps they should become wards of the state and be placed in an orphanage. These questions may require design consideration and there is a wide range of alternatives available to the system designer.

With the ideas of process birth, growth, and death, other concepts may be useful for system designers. For example, there would exist an average lifetime of a process. The expected lifetime may be an important consideration in the number of processes which the system can support without overpopulation. It should also influence the data structures which are used with processes and the frequency of use and design of the birth and death portions of the operating system.

Process control is full of analogies to human social behavior. The environment of the process, its creation, development, work, and eventual retirement and/or death are determined by the kernal of the operating system. The very structure of the processes, the allowed interactions between processes, and the permissible behavior of a process similarly have a great influence upon the structures and uses of the eventual complete system and can be designed in many different ways. Some alternatives can be discovered and their implications hinted at by a critical examination and analogy to other systems and in particular, the human social system.

3. Resource Allocation

Viewing a computer system as a community of processes with the operating system acting as a government, provides an interesting viewpoint on the resource allocation problem. Resource allocation in human societies is the classical problem area of economics. As such all the various opinions and approaches of economic theory can be applies.

Most current computer systems are designed with the ownership of all resources resting with the government. These resources are allocated to each process according to its need, clearly a communistic approach. It has been proposed recently that operating systems are beginning to die out, leading to a withering away of the state, as would be expected according to Marxist-Leninist theory.

Could capitalistic or socialistic resource allocation systems be developed and how would they compare to the present communistic approaches?

4. Conclusion

This paper is meant to evoke thought. The obvious analogy between human society and government, on the one hand, and a computer system of processes and its operating system, on the other hand, can lead to many intriguing possibilities, and could be a great aid in explaining our complex systems to their users and designers.