E1 Distributed Operating System

E1 Arch...

E1 Concepts

E1 Architecture

	E1 Concepts
	Distributed Object
	Design Overview
	Protection Domains
	Crossdomain calls
	Threads
	Component services
	Replication
	Programming in E1

Requirements

Convenient interface. Due to the nature of distributed systems, it is more difficult for users and software developers to work in them, than in centralized ones. Among the complexity factors one can name: heterogeneity of access to local and remote resources, high probability of faults, asynchronous communication environment, non-uniform memory access. To enable computations in such an environment, the distributed operating system must support a set of abstractions, isolating developers from the listed complexities and providing a convenient interface to all the resources of a distributed system.

Efficiency. Operating system efficiency is determined mainly by temporal characteristics of access to various resources. In the distributed environment network latencies become a productivity bottleneck. Therefore distributed operating system should minimize the influence of remote communication on software operation.

Reliability. In the absence of fault tolerance mechanisms, a single node or network connection failure can put the whole distributed system out of order and cause loss of data. Therefore the distributed operating system should provide reliable computations support, including redundant storage and execution, as well as fault recovery.

We will now present the approach taken by E1 to implementation of the above requirements.

Transparent access, based on distributed object abstraction.

To provide applications with convenient interface to all computer network resources, E1 implements a Single System Image abstraction, which implies that for application software the distributed system looks like a centralized one. This property allows a developer to ignore the physical layout of resources but instead focus on the functionality they provide.

Implementation of single system image in E1 is based on abstraction of the distributed object. Distributed objects encapsulate state and functionality of all operating system components. Each object exposes a set of well defined interfaces that can be invoked by other objects. Objects are globally accessible by their interfaces from all nodes of a system.

Both operating system components and application software relies on a single E1 object model, i.e. E1 applications are constructed as a collection of distributed objects. To an application programmer the computer network looks and feels like a single virtual computer, with its software structured like a set of objects. Access to the hardware resources, as well as the interaction between software components are reduced to invoking methods on the corresponding objects.

Object replication for access efficiency and reliability.

The distributed software systems consist of interacting modules located in different network nodes. As the operations, performed in each node, often depend on instructions and data received from remote components, the communication latencies eventually affect the performance of the entire system. Two popular techniques, used to overcome this effect are: replacing remote communication by local operations, and removing remote communication beyond the critical execution paths. Replacing remote communication by a local interaction implies that the state of a server object is cached in the client nodes. In this case read operations are performed locally on the cached copy of an object. Modifications can sometimes also be applied locally with the subsequent delayed delivery of changes to a server. Removing the remote communication beyond the critical paths allows the reduction of the time spent by main computational threads waiting for remote messages. For this purpose additional helper threads, that speculatively obtain the data, required by main computations, are used.

Object replication constitutes a generalization of the indicated approaches. In E1 a complete or partial copy of a distributed object's state can be placed in each node where the object is used. The state of an object is synchronized (replicated) among nodes. Each invocation of an object method is handled by its replica in the node, where the call originates. Communication with the remote replicas is involved only when required by the replication protocol, for example, when it is necessary to obtain a missing part of an object state.

Thus, the distributed communication in E1 is moved inside the distributed object. Hence, efficiency of access to an object is determined by efficiency of the replication strategy. Obviously, there is no single replication strategy, equally effective for all types of objects. Therefore E1 does not impose the use of any specific strategy or a collection of strategies. Instead, E1 provides services and tools to simplify the construction of replicated objects. In effect, for each class of objects the most efficient access algorithm, which takes into account its semantics, can be applied. Such algorithm can be either selected from a set of existing replication strategies, or designed specifically for the given class of objects.

Replication can appear not only as a means of efficient access to an object, but also as a redundancy mechanism. For example, by supporting consistent copies of an object in n different nodes, it is possible to tolerate up to n-1 node crashes [46]. Thus, replication utilizes hardware redundancy of the distributed system to provide reliable execution of applications.

Component model support

Another important principle underlying the E1 architecture is component model support. By following this principle, the replicated objects model has been extended to a component model. Such architecture makes E1 a convenient platform for the development of distributed applications.

Before proceeding to the discussion of the use of component models in distributed operating systems, we will briefly outline the concepts underlying the component software development paradigm.

Component-oriented approach to software development is based on the idea of constructing software systems from prefabricated reusable components. Components should be independently deployable, i.e. a component can be used by a third party, which was not engaged in design and implementation of the given component.

Software component is defined as a unit of composition with contractually specified interfaces and explicit context dependencies only [54]. Components inherit essential concepts from object-oriented programming: encapsulation, polymorphism and availability through interfaces. However, components have additional properties not inherent to objects in object-oriented programming languages. Unlike objects, components are software products. In particular, it means that components can be developed and used independently by different sites. Component is an executable unit, rather than a programming language entity. Therefore, implementation inheritance is not supported for components. Component reuse is achieved by composition and aggregation. For two components to be interoperable, it is sufficient that they fit the requirements of a single component model, whether they were developed using the same or different programming languages. Components are characterized by higher degree of independence, than objects, and consequently, they have coarser granularity. As a rule, a component is constructed from several programming language objects.

The component model specifies the environment in which components operate, including: protected method invocation mechanism, naming service, late binding support, garbage collection service, component development tools, as well as a number of additional services, e.g. persistence, transactions, replication, object trading, etc. (see, for example, [36]).

Extending a component model across the network, yields a convenient environment for distributed applications development which, besides other advantages of component-oriented architecture, provides network transparency, i.e., the components, located in different nodes, can invoke each other the same way as in local interaction. This approach is implemented by middleware systems, e.g. COM [33], Corba [35], EJB [53].

It is remarkable that modern distributed operating systems often provide abstractions and services resembling distributed component models of middleware systems. Apparently, it can be explained by the fact that both classes of systems are intended to serve as software platforms for distributed computing. Like middleware systems, distributed operating systems generally provide unified access to distributed system resources by means of object-oriented interface. In some implementations, objects are first-class citizens ([55], [21], [11]) while other systems support more simple primitives, e.g. message ports in Mach [4] and Chorus [43] or portals in Opal [10] above which the notion of the object is introduced by the object-oriented application run-time. Distributed operating systems provide a number of services for maintaining distributed objects, which are quite similar to key component services. First of all, it is a protected interaction mechanism, supporting the uniform invocation of object methods from any network node, provided that the caller possesses sufficient capabilities. Besides that, distributed operating systems include global naming services that enable binding to an object by its unique identifier. Some systems also support persistence of objects [11, 10, 12, 49, 15].

Despite indicated similarities, today's distributed operating systems do not provide valid component models. In these systems object abstraction serves primarily as a convenient means of interprocess communication, rather than application structuring paradigm. Both operating system services and application software are structured as a set of server processes that expose entry points for communication with other programs. Through an entry point a server exports operations for access to a certain resource or a group of resources. These operations are invoked with object semantics. Client specifies the identifier of an entry point, plus the required operation code and a parameter set. In response to a call, a server can return one or more values. Thus an object serves mainly as a communication abstraction.

At the same time, component software development paradigm regards objects as independent software entities with private state, explicit context dependencies and contractually specified functionality. Such notion of components doesn't fit the framework of modern distributed operating systems. Implementing a component model on top of these systems would require an intermediate software layer, similar to traditional middleware.

We believe that implementation of a distributed component model at the operating system level has potential advantages over the middleware approach. The designer of a component-oriented middleware inevitably arrives at the implementation of some virtual machine over the operating system abstractions, which, naturally, results in significantly reduced performance. In order to get rid of this overhead, we suggest that component model support should be initially designed into operating system. Following this approach, E1 implements a distributed component model, based on the abstraction of replicated object.

On the low level, the E1 component model relies on the execution primitives, which are essentially different from the ones used by the conventional operating systems. The primary execution abstraction in the conventional systems is process or task, representing an instance of a program, loaded into memory. Each task runs in a separate address space. Within a task several execution threads can exist. This model does not appropriately support interacting objects of medium granularity [18].Therefore, we abandon it for the new execution model, tailored for component systems. In E1 all executable code and data belong to objects. All objects reside within a single 64-bit address space. E1 supports the migrating threads model [18], in which execution of a thread, invoking an object method, is transferred to the context of the invoked object. Migrating threads allow the departure from a server-style object design, where an object runs one or several threads to process incoming method invocations.

Another feature of E1 component model is that it is based on replicated objects. The ability to replicate is a generic property of all objects. E1 provides extensive support for replication, including flexible replica communication service and extensible library of replication strategies.

Besides these services, E1 component model provides:

Protected interaction mechanism, supporting the transparent invocation of object methods from any network node. In E1 all invocations are processed by the local replica of an object. Legitimacy of each call is verified by the distributed Access Control Server (ACS).
Class Repository and Dynamic Class Loader.
Global Naming Service, providing mapping of a unique object identifier to one ore more contact points of a given object.
Garbage collection system, which detects and destroys unused object replicas on the basis of reference graph analysis.
Support for persistence, which provides object lifetime control, based on reliable storage of a consistent object state in nonvolatile memory.
Component development tools, including E1 Interface Definition Language compiler and Replication Strategies Compiler.

Since both operating system services and application software are developed within the framework of a single E1 component model, the model has to be highly flexible, while introducing minimal overhead.


	Copyright E1 Team 2003


		mail:team@E1OS.org