System Support for Partition-Aware Network Applications
of Computer Science - University of Bologna
Partitions are a fact of life in most practical distributed systems and they tend to become more frequent as the geographic extent of the system grows or its connectivity weakens due the presence of wireless links. In addition to accidental partitions caused by failures, mobile computing systems that support "disconnected operation" have to face partitions when units are deliberately unplugged from the network. What distinguishes partitions from ordinary communication failures is that they disrupt communication between sites and the usual system layers cannot hide this fact from applications. To do so requires special communication layers that buffer messages at their origin throughout a partition and retransmit them upon reconnection. Even if all partitions are eventually repaired, this approach may be impractical for several reasons. First, the number of messages that need to be buffered for retransmission during extended periods where communication is interrupted may grow arbitrarily large. Second, communication state information has to survive across site failures and thus has to be maintained in stable storage. But more importantly, portions of an application that span multiple partitions remain blocked until communication is restored, thus precluding continued availability in concurrent partitions.
For certain application classes with strong consistency requirements, it may be the case that all services have to be suspended completely in all but one partition. This situation corresponds to the so-called primary-partition model that has traditionally characterized partitioned operation of network applications. Our effort is focused on system services that support partition awareness such that continued operation of network applications is not restricted to a single partition but may span multiple concurrent partitions. The system provides the necessary abstractions such that the application itself can decide which of its services will be available in each partition and what level of consistency can be guaranteed. As an example, consider a replicated data application. To guarantee a form of weak-consistency, it may be possible to service read operations in any partition containing a copy of the data but write operations have to be restricted to a single partition according to some rule (e.g., inclusion of a majority of replicas). In other words, partitions result in service reduction (in some partitions write operations are unavailable) but no appreciable service degradation. On the other hand, a network application for parallel computing that decomposes a single task among available sites will exhibit service degradation but no service reduction: the result of the task is eventually known in every partition but smaller partitions take longer to compute it due to reduced parallelism.
We propose a set of services that need to be supported by the system for developing large classes of partition-aware applications in a systematic manner. Our methodology is based on the process group paradigm suitably extended to partitionable systems such that the group composition is an effective abstraction of the environment with respect to partitions and merges [BDM97]. Members of a group cooperate in order to implement a given network application. Partition-aware applications are programmed so as to reconfigure themselves and adjust their behavior using the current composition of the group as input. We have specified a partitionable group membership service (PGMS) that forms the basis of our support layer. PGMS alone may be sufficient for programming a class of self-configuring network services. To support a broader class of applications that require closer cooperation, we augmented PGMS with a reliable multicast communication service satisfying view synchrony semantics. View synchrony integrates message deliveries with group composition changes such that group members can reason globally about the set of messages delivered by others based solely on local information. The support layer we propose can be seen as a "micro-kernel" for building partition-aware applications in that it includes only the minimum indispensable set of services. In particular, message ordering issues are not treated by the system but are left up to the application, which knows best how to resolve them.
We argue that partition awareness is an important attribute of future network applications and to show that the view synchrony service we specify is indeed useful for supporting it. We do so by developing partition-aware solutions to test problems which are representative of particular classes of realistic applications [BDMS97].
Further information can be obtained from the Relacs web page.
[BDM97] O. Babaoglu, R. Davoli, and A. Montresor. A Partitionable Group Membership Service for Asynchronous Distributed Systems. Tech. Rep. UBLCS97-1, Dept. of Computer Science, University of Bologna, 1997.