--------------------------------------------------- Return-Path: sparekh@crocus Received: from crocus.cs.washington.edu (crocus.cs.washington.edu [128.95.1.67]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id XAA08193 for ; Mon, 20 Jan 1997 23:16:30 -0800 Received: (sparekh@localhost) by crocus.cs.washington.edu (8.6.12/7.2ws+) id XAA15080; Mon, 20 Jan 1997 23:16:28 -0800 Date: Mon, 20 Jan 1997 23:16:28 -0800 Message-Id: <199701210716.XAA15080@crocus.cs.washington.edu> From: Sujay Parekh To: bershad@cs Subject: Isis 552-reading summary While most systems are equipped with mechanisms to carry out reliable, fault-tolerant communication with another system, there are often cases where fault-tolerant semantics need to be defined when sending messages to a set of systems. This chapter describes several useful semantics, the protocols for providing such semantics and the costs and issues involved in using them. There are several key assumptions in the system model. Firstly, the point-to-point communication mechanism is assumed to be reliable and sequenced. Secondly, failures are assumed to be crash-stop. The existence of a failure detector is assumed. The first assumption is not unrealistic, since most systems provide it, while the failure model is acceptable since it makes the exposition simpler. Automatic mechanisms exist for enabling crash failure-resistant protocols to deal with a larger class of failures. The protocols for providing reliable broadcast can be evaluated based on several criteria: (a) the latency of delivering a message, (b) communication overhead for the protocol and (c) extra memory resources occupied in the network buffers of participating machines. The simplest broadcast semantics is that of atomic broadcast: either messages are delivered to all members of the broadcast group or to none. The simple protocol for this has low latency and communication overhead, but consumes memory resources. A stronger guarantee is that, in addition to atomicity, all messages are delivered in the same order everywhere. The ABCAST protocol for this has high latency of delivery, requiring a minimum of 2 rounds of communication before a message is delivered. A minimum of 2 extra messages must be exchanged between the initiator and each recipient before a message is delivered. Memory resources of a message are tied up from the time it is received till it is delivered at that node. A second set of protocols for providing the same semantics is described. These protocols use tokens to arbitrate message order. The costs are determined by the various token-related parameters of these protocols. A nice property is that by changing the rate of token-passing between processors, it is possible to trade communication cost with latency and storage costs. A third possibility is based on the idea of imposing a tree structure on the network and using this to arbitrate message order. Finally, there exist "optimistic" protocols that attempt to reduce communication overhead, but this is often at the expense of delivery latency. A less stringent requirement than totally ordered broadcast is that of causally-ordered broadcast: namely, only the causally related messages are required to be delivered in the same order everywhere. The basic protocol works by piggy-backing an event "history" on all message sends. Various optimizations can be performed that reduce the message overhead. This protocol, CBCAST, has 0 delivery latency. It incurs communication cost only in that message sizes are larger, but no extra messages need to be sent. Memory costs, however, are high. Finally, one may wish to have real-time guarantees on message delivery. The protocols for providing such semantics typically assume the knowledge of timing bounds for various system-related behavior. These protocols make extensive use of temporal knowledge to implicitly reach agreement on various aspects of message delivery. Consequently, their communication costs are lower. However, delivery latency (and storage costs) may depend on worst-case system behavior. One could make stronger assumptions about system performance and reduce this latency, but then under conditions of high load or network traffic, these assumptions stand an increased chance of being violated. An additional enhancement concerns the naming of broadcast groups. One can assume that a process explicitly names each member of a group. An alternative is to use a group name that refers to a network-wide dynamic set of members. In order to provide this flexibility, additional overhead must be incurred to ensure that all participants have a uniform view of group membership. --------------------------------------------------- Return-Path: rgrimm@cs.washington.edu Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA08604 for ; Tue, 21 Jan 1997 00:24:38 -0800 Received: from [128.95.8.129] (h129.dyn.cs.washington.edu [128.95.8.129]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id AAA25115 for ; Tue, 21 Jan 1997 00:24:35 -0800 X-Sender: rgrimm@june.cs.washington.edu Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Tue, 21 Jan 1997 00:22:17 -0800 To: bershad@cs From: rgrimm@cs.washington.edu (Robert Grimm) Subject: Isis 552-reading summary Joseph and Birman survey reliable broadcast protocols, focusing on three protocols developed for Isis (with additional material on real-time delivery protocols). The surveyed protocols are based on a simple failure model, namely the "crash model:" On failure, a machine immediately halts, but does not perform incorrect actions nor fails to perform actions it is supposed to execute. Building on this failure model, all three protocols are atomic, i.e., a broadcast message is either received by all destinations (that do not fail) or by none (which, additionally, may only occur if the sender fails before completion of the broadcast). However, the three protocols differ in their ordering properties, their usage and their cost or complexity: (1) The ABCAST protocol orders all messages from all machines (i.e., provides a total ordering), is the most general protocol and also the most complex. (2) The CBCAST protocol only guarantees a consistent ordering for all causally related messages, thus may not be useful to some applications but enables the immediate delivery of messages at the destination. (3) The GBCAST protocol is consistently ordered in respect to the first two protocols, is only used to communicate group membership changes, and its complexity depends on the number of sites that store group membership information. While Joseph and Birman certainly give a good introduction to and survey of reliable broadcast protocols, I would have liked to see some performance numbers of the Isis protocols to get a better feeling for how the performance of the protocols differs. Furthermore, a discussion of how to address other failure modes would have been helpful to understand the effects on protocol design (they mention translation techniques to the simple failure model, but what is the intuition behind them?). --------------------------------------------------- Return-Path: ddion@june Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA08656 for ; Tue, 21 Jan 1997 00:32:46 -0800 Received: (ddion@localhost) by june.cs.washington.edu (8.8.3+CSE/7.2ju) id AAA25426; Tue, 21 Jan 1997 00:32:46 -0800 From: ddion@june (David Dion) Message-Id: <199701210832.AAA25426@june.cs.washington.edu> Subject: Isis 552-reading summary To: bershad@cs Date: Tue, 21 Jan 1997 00:32:45 -0800 (PST) Cc: ddion@june (David Dion) X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit RPC is not ideal for all forms of communication in a distributed system. While RPC provides communication between a client and server process, a broadcast mechanism allows a process to send a message to a set of processes. What makes broadcast challenging is reacting to failure. Broadcast mechanisms that have well-defined behavior even during failures are reliable broadcasts. The performance of reliable broadcast protocols can be evaluated based on three factors: latency, the time between sending and delivering the message; storage, the buffer space consumed by the protocol; and communication overhead, the number of service messages required to reliably broadcast an actual message. Reliable broadcast protocols are designed to meet certain properties while optimizing performance. First, reliable broadcast protocols strive for atomicity. That is, a message is either received by all destinations that do not fail or by none of them. The next step beyond atomicity is atomicity with ordering. With ordering, not only is atomicity guaranteed, but the order in which these messages are delivered to destination processes is the same across all recipients. Two protocols are discussed that provide ordered broadcast. Both are rather expensive in performance, despite recent optimizations. By relaxing the ordering requirement, performance can improve significantly. ISIS implements a CBCAST, a Causal Broadcast system which guarantees that causally related messages will be delivered in the proper order, while non-causally related messages can be ordered arbitrarily. CBCAST is implemented by "piggy-backing" causally preceding messages on current messages. With clever optimizations, the need for piggy-backing can be reduced to a minimum, though storage requirements still remain high. Other reliable broadcast protocols make real-time delivery guarantees, in which delivery is guaranteed to occur virtually simultaneously. Real-time broadcast requires nodes of the system to have synchronized clocks, and protocols do not respond well to failure. Identifying the set of recipients of a broadcast message is also a challenging problem. Rather than enumerate the set of recipients, a sender should be able to specify a process group, or a logical set of processes whose membership may change with time. Messages sent during group membership changes should go either to the members before the change or after the change, but never to an intermediate membership. ISIS provides a group membership broadcast protocol, GBCAST, which is ordered relative to all other broadcast protocols. --------------------------------------------------- Return-Path: tian@wally Received: from wally.cs.washington.edu (wally.cs.washington.edu [128.95.2.122]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id BAA08986 for ; Tue, 21 Jan 1997 01:57:35 -0800 Received: (tian@localhost) by wally.cs.washington.edu (8.8.3+CSE/7.2ws+) id BAA25790; Tue, 21 Jan 1997 01:57:35 -0800 (PST) Date: Tue, 21 Jan 1997 01:57:35 -0800 (PST) From: Tian Lim To: bershad@cs Subject: Isis 552-reading summary Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Joseph and Birman focus on three reliable broadcast protocols in ISIS : atomic broadcast (ABCAST), causal broadcast (CBCAST), and global broadcast (GBCAST). They argue that a reliable broadcast mechanism is essential for distributed applications that do not fit the client-server model (RPC). The big problems with broadcasting are (a) making it resilient to failures and (b) meeting some measure of ordering guarantees. For (a), the authors chose the "crash model" as their main failure mode (a machine is either up or down but never performs incorrectly otherwise), arguing that its semantics are easy to implement above more complex failures. ABCAST provides atomicity - a broadcast is received by all destinations that do not fail, or by none of them (i.e. the sender fails). In addition all broadcasts are delivered in the same order across destination sites (this ordering is not necessarily the order the messages were sent). This is done by agreeing on a timestamp for each message before its delivery. CBCAST provides causal ordering - a message m will never be delivered before the messages that m is possibly causally dependent on. This is done by referencing the messages that m is possibly dependent on. GBCAST is ordered relative to ABCASTs and CBCASTs, and is used primarily for announcing group membership changes. This ordering ensures that other broadcasts will be seen by either the old group members or the new ones, but never some "intermediate" set. The authors describe other protocols, such as token based and real time protocols. They also mention how failure detectors can exacerbate overloaded sites with liveness probes, and how network characteristics can influence algorithm design, e.g. by leveraging ethernet broadcast. Two trends are evident in the authors' discourse. First, reliable broadcast costs in delivery latency, and additional communication and memory overhead. They show that these can be traded off against one another and against ordering guarantees, but unfortunately do not quantify the relationships with measurements. Second, there seems to be an implicit assumption that failures are rare, and the algorithms therefore attempt to push the costs into the failure cases. --------------------------------------------------- Return-Path: sungeun@wormwood Received: from wormwood.cs.washington.edu (wormwood.cs.washington.edu [128.95.2.107]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id HAA10550 for ; Tue, 21 Jan 1997 07:23:26 -0800 Received: (sungeun@localhost) by wormwood.cs.washington.edu (8.8.3+CSE/7.2ws+) id HAA19916; Tue, 21 Jan 1997 07:23:26 -0800 (PST) Date: Tue, 21 Jan 1997 07:23:26 -0800 (PST) Message-Id: <199701211523.HAA19916@wormwood.cs.washington.edu> From: Sung-Eun Choi To: bershad@cs Subject: Isis 552-reading summary Reply-To: sungeun@cs.washington.edu This paper describes protocols for providing "reliable" broadcasts in a distributed system. Reliability is determined by of the types of services offered (atomicity & different degrees of ordered-ness) and the behaviour of the system as a whole (resistance to failure and delivery time constraints). Precisely defining such protocols enables "optimizations" that may improve the performance of this otherwise heavy weight operation. For example, providing ordered broadcast messages on a per initiator basis can be simpler than providing a globally consistent ordering of messages. The services provided seem dictated by programmer use, though the behaviour of the system would seem to influence the types of services realistically providable. Alternatively, real-time delivery guarantees and handling failures require direct consideration into system use and physical environment. Thus guaranteeing any such behaviour is more likely to be a limitation. --------------------------------------------------- Return-Path: govindk@shasta.ee.washington.edu Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11078 for ; Tue, 21 Jan 1997 08:31:05 -0800 Received: from shasta.ee.washington.edu (shasta.ee.washington.edu [128.95.28.11]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id IAA13315 for ; Tue, 21 Jan 1997 08:31:04 -0800 Received: from andes.faulty (andes.ee.washington.edu) by shasta.ee.washington.edu (4.1/SMI-4.1) id AA04899; Tue, 21 Jan 97 08:32:12 GMT Received: by andes.faulty (SMI-8.6/SMI-SVR4) id IAA17384; Tue, 21 Jan 1997 08:31:55 -0800 Date: Tue, 21 Jan 1997 08:31:55 -0800 From: govindk@shasta.ee.washington.edu (Govindarajan K) Message-Id: <199701211631.IAA17384@andes.faulty> To: bershad@cs Subject: Isis 552-reading summary Groups are ubiqutous in human society. We often use collective names for groups of people as a convenient means for referring to or addressing some part of the population as if it were a single entity, like a school class an age group or a social category. Similarly goups can be used in distributed computing systems to help master the complexity of large applications or to help provide non-functional properties like availability or security. The paper presents different broadcast algorithms to achieve global consistency in the system in the scenario of messages received and the order in which they are received. The failure model considered in the system is initially the crash-failure model but it is later extended to partitioning of networks. Some recent publications however tend to model network failures as processor failures. Broadcast protocols are better suited for the purpose because RPC is more suited for the client server model, and not one server and multple recipient model considered here. The performance of broadcast protocols are measured in terms of the latency, the amount of buffering needed at the servers and the number of messages needed to ensure the task is done. The paper starts of by considering a simple protocol which just ensures atomicity. Then the paper goes on to consider protocols in which not only atomicity is maintained but also ordering of messages. Total ordering is expensive , however present day research has shown that much faster totally ordered protocols are possible. ISIS implements CBAST which messages are causally ordered. Here past messages are piggy backed on current messages. The paper discusses ways to optimize the number of messages passed and also methods to reduce the buffer space requirements. The paper later discusses a protocol in which group membership information is maintained to take care of partitioning of networks and inconsistency brought forth by the constant joining and leaving of members. These protocols however fail in the context of Byzantine faults. --------------------------------------------------- Return-Path: echris@merganser Received: from merganser.cs.washington.edu (merganser.cs.washington.edu [128.95.2.192]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11145 for ; Tue, 21 Jan 1997 08:33:53 -0800 Received: (echris@localhost) by merganser.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA16685; Tue, 21 Jan 1997 08:33:53 -0800 (PST) Date: Tue, 21 Jan 1997 08:33:53 -0800 (PST) Message-Id: <199701211633.IAA16685@merganser.cs.washington.edu> From: E Christopher Lewis To: bershad@cs Subject: Isis 552-reading summary Reply-To: echris@cs.washington.edu This chapter provides an overview of reliable broadcast protocol issues and implementations. The authors suggest that there are two axes used to characterize these protocols: failure tolerance and service guarantee. Along the first axis, the "crash model," omission failures, Byzantine failures, and unbounded delay "failures" are introduced. The "crash model" is assumed for the remainder of the chapter, and the authors argue that it is a simple yet realizable abstraction. On the second axis is service guarantee. This is the primary axis explored in the chapter, and services discussed include atomic broadcast, ordered broadcast, FIFO (from a single source) broadcast, causal ordered broadcast, broadcast with real-time guarantees (perhaps its own axis), and broadcast to dynamically changing groups. Each service provides different guarantees, and a distributed application developer needs to select a service appropriate for the application's needs. The topic that most interests (well, confuses) me received little ink in this chapter: the interaction between the two axes. In designing these protocols, I doubt it is as simple as selecting a failure model and assuming that abstraction can be implemented. Given the fact it is probably impossible to truly implement any reasonable failure model, there must be some play between the abstraction and the protocol. --------------------------------------------------- Return-Path: yasushi@silk Received: from silk.cs.washington.edu (silk.cs.washington.edu [128.95.2.238]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11349 for ; Tue, 21 Jan 1997 08:49:37 -0800 Received: (yasushi@localhost) by silk.cs.washington.edu (8.7.2/7.2ws+) id IAA09672; Tue, 21 Jan 1997 08:49:36 -0800 (PST) Date: Tue, 21 Jan 1997 08:49:36 -0800 (PST) From: yasushi@silk Message-Id: <199701211649.IAA09672@silk.cs.washington.edu> To: bershad@silk Subject: Isis 552-reading summary This paper describes multicasting protocols with various ordering guarantees. The atomic broadcast protocol ensures that all the recipient members see same total ordering of messages. The base version of this protocol requires two round trip between the sender and every receiver. The causal broadcast is a weaker form of multicast than atomic broadcast. It ensures the consistent ordering only between causally related messages. This is implemented by piggypacking the past messages with the new one. This protocol requires one round trip per each message, but space and message size overheads can be substantially reduced by assuming an upper limit on transmission latency. FIFO broadcast the weakest of all; it ensures the consistent ordering among messages from same sender. FIFO property is usually automatically guaranteed by the transport protocol. Real time multicast can be implemented by assuming the maximum transmission latency. Finally, the paper describes the mechanism to ensure the atomicity of the group membership changes. This is done by treating the group state notification as a special kind of message, and collectively guarantee the ordering such message and other kind of messages. --------------------------------------------------- Return-Path: matthai@franklin.cs.washington.edu Received: from franklin.cs.washington.edu (franklin.cs.washington.edu [128.95.2.103]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11454 for ; Tue, 21 Jan 1997 08:59:55 -0800 Received: from localhost (localhost [127.0.0.1]) by franklin.cs.washington.edu (8.8.3+CSE/7.2ws+) with SMTP id IAA19365 for ; Tue, 21 Jan 1997 08:59:55 -0800 (PST) Message-Id: <199701211659.IAA19365@franklin.cs.washington.edu> X-Mailer: exmh version 1.5.3 12/28/94 To: bershad@cs Subject: Isis 552-reading summary Reply-to: Matthai Philipose Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Tue, 21 Jan 1997 08:59:54 PST From: Matthai Philipose In distributed systems, the need may arise for one process to broadcast a message to many others. Compared to the single process-to-single-process case, at least two complications may arise: --The messages may not be ordered "consistently" at all processes. --If the broadcast fails, all destination processes of the broadcast may not perceive the failure in a consistent manner. Birman and Joseph describe four models of broadcast (atomic broadcast, ordered atomic broadcast (ABCAST), causally ordered brodcast (CBCAST) and Group Broadcast (GBCAST)) which have concrete, though different, order guarantees and failure semantics. In describing the algorithms, B & J make at least the following assumptions about the distributed system: --The underlying communication layer provides reliable point to point communication. --Failures happen only in the form of processors crashing and stopping. --Failures are detectible by other processors in the communicating group. They indicate that more complex failure models can be transformed into the above "crash model". Further, unreliability in the communication layer may be modelled by more complex processor failure models. Given the above assumptions, B & J go on to describe informally protocols to implement the four broadcast models above. For each protocol, their strategy is to describe a simple, easy-to-prove-correct, but inefficient algorithm and then to informally add modifications to the algorithm to make it more efficient. They measure efficiency in terms of latency, volume of communication and amount of storage required by these algorithms. Some of the strategies they use to make their algorithms faster are: 1)Assuming that failure is uncommon, and pushing high-overhead operations to the case where failure actually happens. 2)Assuming that a given process probably initiates many rounds of communication in rapid succession, and piggybacking control information related to one round of communication with the data for the next round. 3)Relaxing the consistency model used. For instance, in many problems, causal ordering may suffice instead of total ordering. If causal ordering is used, the common case of a processor sending a messsage to itself as part of a broadcast may be optimized for latency. It is plausible that using these primitives makes distributed programming easier. Indeed, the difficulty of convincing myself that even the simplest of these protocols (atomic broadcasts) does "what it is supposed to" convinces me that rolling your own broadcast primitives and getting it correct may be very difficult. On the other hand, I would like to see how naturally the primitives presented in the articles map on to real applications. Even though the semantics (both success and failure) of the primitive seem to be concrete, they are still fairly subtle, and it could still take a lot of care to build a correct system using them. Hopefully, the majority of distributed applications either require simple applications of these primitives or do not require them at all!