---------------------------------------------------
Return-Path: sparekh@crocus
Received: from crocus.cs.washington.edu (crocus.cs.washington.edu [128.95.1.67]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id XAA08193 for <bershad@whistler.cs.washington.edu>; Mon, 20 Jan 1997 23:16:30 -0800
Received: (sparekh@localhost) by crocus.cs.washington.edu (8.6.12/7.2ws+) id XAA15080; Mon, 20 Jan 1997 23:16:28 -0800
Date: Mon, 20 Jan 1997 23:16:28 -0800
Message-Id: <199701210716.XAA15080@crocus.cs.washington.edu>
From: Sujay Parekh <sparekh@cs.washington.edu>
To: bershad@cs
Subject: Isis 552-reading summary


While most systems are equipped with mechanisms to carry out reliable,
fault-tolerant communication with another system, there are often
cases where fault-tolerant semantics need to be defined when sending
messages to a set of systems.  This chapter describes several useful
semantics, the protocols for providing such semantics and the costs
and issues involved in using them.  There are several key assumptions
in the system model.  Firstly, the point-to-point communication
mechanism is assumed to be reliable and sequenced.  Secondly, failures
are assumed to be crash-stop. The existence of a failure detector is
assumed.  The first assumption is not unrealistic, since most systems
provide it, while the failure model is acceptable since it makes the
exposition simpler.  Automatic mechanisms exist for enabling crash
failure-resistant protocols to deal with a larger class of failures.
The protocols for providing reliable broadcast can be evaluated based
on several criteria: (a) the latency of delivering a message, (b)
communication overhead for the protocol and (c) extra memory resources
occupied in the network buffers of participating machines.

The simplest broadcast semantics is that of atomic broadcast: either
messages are delivered to all members of the broadcast group or to
none.  The simple protocol for this has low latency and communication
overhead, but consumes memory resources.

A stronger guarantee is that, in addition to atomicity, all messages
are delivered in the same order everywhere.  The ABCAST protocol for
this has high latency of delivery, requiring a minimum of 2 rounds of
communication before a message is delivered.  A minimum of 2 extra
messages must be exchanged between the initiator and each recipient
before a message is delivered.  Memory resources of a message are tied
up from the time it is received till it is delivered at that node.  A
second set of protocols for providing the same semantics is described.
These protocols use tokens to arbitrate message order.  The costs are
determined by the various token-related parameters of these protocols.
A nice property is that by changing the rate of token-passing between
processors, it is possible to trade communication cost with latency
and storage costs.  A third possibility is based on the idea of
imposing a tree structure on the network and using this to arbitrate
message order.  Finally, there exist "optimistic" protocols that
attempt to reduce communication overhead, but this is often at the
expense of delivery latency.

A less stringent requirement than totally ordered broadcast is that of
causally-ordered broadcast: namely, only the causally related messages
are required to be delivered in the same order everywhere.  The basic
protocol works by piggy-backing an event "history" on all message
sends.  Various optimizations can be performed that reduce the message
overhead.  This protocol, CBCAST, has 0 delivery latency.  It incurs
communication cost only in that message sizes are larger, but no extra
messages need to be sent.  Memory costs, however, are high.

Finally, one may wish to have real-time guarantees on message
delivery.  The protocols for providing such semantics typically assume
the knowledge of timing bounds for various system-related behavior.
These protocols make extensive use of temporal knowledge to implicitly
reach agreement on various aspects of message delivery.  Consequently,
their communication costs are lower.  However, delivery latency (and
storage costs) may depend on worst-case system behavior.  One could
make stronger assumptions about system performance and reduce this
latency, but then under conditions of high load or network traffic,
these assumptions stand an increased chance of being violated.

An additional enhancement concerns the naming of broadcast groups.
One can assume that a process explicitly names each member of a group.
An alternative is to use a group name that refers to a network-wide
dynamic set of members.  In order to provide this flexibility,
additional overhead must be incurred to ensure that all participants
have a uniform view of group membership.
---------------------------------------------------
Return-Path: rgrimm@cs.washington.edu
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA08604 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 00:24:38 -0800
Received: from [128.95.8.129] (h129.dyn.cs.washington.edu [128.95.8.129]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id AAA25115 for <bershad@cs>; Tue, 21 Jan 1997 00:24:35 -0800
X-Sender: rgrimm@june.cs.washington.edu
Message-Id: <v02140b01af0a2d212ee6@[128.95.8.127]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Tue, 21 Jan 1997 00:22:17 -0800
To: bershad@cs
From: rgrimm@cs.washington.edu (Robert Grimm)
Subject: Isis 552-reading summary

Joseph and Birman survey reliable broadcast protocols, focusing on
three protocols developed for Isis (with additional material on
real-time delivery protocols). The surveyed protocols are based on a
simple failure model, namely the "crash model:" On failure, a machine
immediately halts, but does not perform incorrect actions nor fails to
perform actions it is supposed to execute. Building on this failure
model, all three protocols are atomic, i.e., a broadcast message is
either received by all destinations (that do not fail) or by none
(which, additionally, may only occur if the sender fails before
completion of the broadcast).  However, the three protocols differ in
their ordering properties, their usage and their cost or complexity:
(1) The ABCAST protocol orders all messages from all machines (i.e.,
provides a total ordering), is the most general protocol and also the
most complex. (2) The CBCAST protocol only guarantees a consistent
ordering for all causally related messages, thus may not be useful to
some applications but enables the immediate delivery of messages at
the destination. (3) The GBCAST protocol is consistently ordered in
respect to the first two protocols, is only used to communicate group
membership changes, and its complexity depends on the number of sites
that store group membership information.

While Joseph and Birman certainly give a good introduction to and
survey of reliable broadcast protocols, I would have liked to see some
performance numbers of the Isis protocols to get a better feeling for
how the performance of the protocols differs. Furthermore, a
discussion of how to address other failure modes would have been
helpful to understand the effects on protocol design (they mention
translation techniques to the simple failure model, but what is the
intuition behind them?).


---------------------------------------------------
Return-Path: ddion@june
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA08656 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 00:32:46 -0800
Received: (ddion@localhost) by june.cs.washington.edu (8.8.3+CSE/7.2ju) id AAA25426; Tue, 21 Jan 1997 00:32:46 -0800
From: ddion@june (David Dion)
Message-Id: <199701210832.AAA25426@june.cs.washington.edu>
Subject: Isis 552-reading summary
To: bershad@cs
Date: Tue, 21 Jan 1997 00:32:45 -0800 (PST)
Cc: ddion@june (David Dion)
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


RPC is not ideal for all forms of communication in a distributed
system.  While RPC provides communication between a client and server
process, a broadcast mechanism allows a process to send a message to a
set of processes.  What makes broadcast challenging is reacting to
failure.  Broadcast mechanisms that have well-defined behavior even
during failures are reliable broadcasts.

The performance of reliable broadcast protocols can be evaluated based
on three factors: latency, the time between sending and delivering the
message; storage, the buffer space consumed by the protocol; and
communication overhead, the number of service messages required to
reliably broadcast an actual message.  Reliable broadcast protocols
are designed to meet certain properties while optimizing performance.

First, reliable broadcast protocols strive for atomicity.  That is, a
message is either received by all destinations that do not fail or by
none of them.  The next step beyond atomicity is atomicity with
ordering.  With ordering, not only is atomicity guaranteed, but the
order in which these messages are delivered to destination processes
is the same across all recipients.  Two protocols are discussed that
provide ordered broadcast.  Both are rather expensive in performance,
despite recent optimizations.  By relaxing the ordering requirement,
performance can improve significantly.  ISIS implements a CBCAST, a
Causal Broadcast system which guarantees that causally related
messages will be delivered in the proper order, while non-causally
related messages can be ordered arbitrarily.  CBCAST is implemented by
"piggy-backing" causally preceding messages on current messages.  With
clever optimizations, the need for piggy-backing can be reduced to a
minimum, though storage requirements still remain high.  Other
reliable broadcast protocols make real-time delivery guarantees, in
which delivery is guaranteed to occur virtually simultaneously.
Real-time broadcast requires nodes of the system to have synchronized
clocks, and protocols do not respond well to failure.

Identifying the set of recipients of a broadcast message is also a
challenging problem.  Rather than enumerate the set of recipients, a
sender should be able to specify a process group, or a logical set of
processes whose membership may change with time.  Messages sent during
group membership changes should go either to the members before the
change or after the change, but never to an intermediate membership.
ISIS provides a group membership broadcast protocol, GBCAST, which is
ordered relative to all other broadcast protocols.
---------------------------------------------------
Return-Path: tian@wally
Received: from wally.cs.washington.edu (wally.cs.washington.edu [128.95.2.122]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id BAA08986 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 01:57:35 -0800
Received: (tian@localhost) by wally.cs.washington.edu (8.8.3+CSE/7.2ws+) id BAA25790; Tue, 21 Jan 1997 01:57:35 -0800 (PST)
Date: Tue, 21 Jan 1997 01:57:35 -0800 (PST)
From: Tian Lim <tian@wally.cs.washington.edu>
To: bershad@cs
Subject: Isis 552-reading summary
Message-ID: <Pine.OSF.3.91.970121015649.25639A-100000@wally.cs.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Joseph and Birman focus on three reliable broadcast protocols in ISIS
: atomic broadcast (ABCAST), causal broadcast (CBCAST), and global
broadcast (GBCAST).  They argue that a reliable broadcast mechanism is
essential for distributed applications that do not fit the
client-server model (RPC).  The big problems with broadcasting are (a)
making it resilient to failures and (b) meeting some measure of
ordering guarantees.  For (a), the authors chose the "crash model" as
their main failure mode (a machine is either up or down but never
performs incorrectly otherwise), arguing that its semantics are easy
to implement above more complex failures.

ABCAST provides atomicity - a broadcast is received by all
destinations that do not fail, or by none of them (i.e. the sender
fails).  In addition all broadcasts are delivered in the same order
across destination sites (this ordering is not necessarily the order
the messages were sent).  This is done by agreeing on a timestamp for
each message before its delivery.  CBCAST provides causal ordering - a
message m will never be delivered before the messages that m is
possibly causally dependent on.  This is done by referencing the
messages that m is possibly dependent on.  GBCAST is ordered relative
to ABCASTs and CBCASTs, and is used primarily for announcing group
membership changes.  This ordering ensures that other broadcasts will
be seen by either the old group members or the new ones, but never
some "intermediate" set.

The authors describe other protocols, such as token based and real
time protocols.  They also mention how failure detectors can
exacerbate overloaded sites with liveness probes, and how network
characteristics can influence algorithm design, e.g. by leveraging
ethernet broadcast.

Two trends are evident in the authors' discourse.  First, reliable
broadcast costs in delivery latency, and additional communication and
memory overhead.  They show that these can be traded off against one
another and against ordering guarantees, but unfortunately do not
quantify the relationships with measurements.  Second, there seems to
be an implicit assumption that failures are rare, and the algorithms
therefore attempt to push the costs into the failure cases.


---------------------------------------------------
Return-Path: sungeun@wormwood
Received: from wormwood.cs.washington.edu (wormwood.cs.washington.edu [128.95.2.107]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id HAA10550 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 07:23:26 -0800
Received: (sungeun@localhost) by wormwood.cs.washington.edu (8.8.3+CSE/7.2ws+) id HAA19916; Tue, 21 Jan 1997 07:23:26 -0800 (PST)
Date: Tue, 21 Jan 1997 07:23:26 -0800 (PST)
Message-Id: <199701211523.HAA19916@wormwood.cs.washington.edu>
From: Sung-Eun Choi <sungeun@cs.washington.edu>
To: bershad@cs
Subject: Isis 552-reading summary
Reply-To: sungeun@cs.washington.edu

This paper describes protocols for providing "reliable" broadcasts in
a distributed system.  Reliability is determined by of the types of
services offered (atomicity & different degrees of ordered-ness) and
the behaviour of the system as a whole (resistance to failure and
delivery time constraints).  Precisely defining such protocols enables
"optimizations" that may improve the performance of this otherwise
heavy weight operation.  For example, providing ordered broadcast
messages on a per initiator basis can be simpler than providing a
globally consistent ordering of messages.

The services provided seem dictated by programmer use, though the
behaviour of the system would seem to influence the types of services
realistically providable.  Alternatively, real-time delivery
guarantees and handling failures require direct consideration into
system use and physical environment.  Thus guaranteeing any such
behaviour is more likely to be a limitation.
---------------------------------------------------
Return-Path: govindk@shasta.ee.washington.edu
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11078 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 08:31:05 -0800
Received: from shasta.ee.washington.edu (shasta.ee.washington.edu [128.95.28.11]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id IAA13315 for <bershad@cs>; Tue, 21 Jan 1997 08:31:04 -0800
Received: from andes.faulty (andes.ee.washington.edu) by shasta.ee.washington.edu (4.1/SMI-4.1)
	id AA04899; Tue, 21 Jan 97 08:32:12 GMT
Received: by andes.faulty (SMI-8.6/SMI-SVR4)
	id IAA17384; Tue, 21 Jan 1997 08:31:55 -0800
Date: Tue, 21 Jan 1997 08:31:55 -0800
From: govindk@shasta.ee.washington.edu (Govindarajan K)
Message-Id: <199701211631.IAA17384@andes.faulty>
To: bershad@cs
Subject: Isis 552-reading summary

Groups are ubiqutous in human society. We often use collective
names for groups of people as a convenient means for referring
to or addressing some part of the population as if it were a 
single entity, like a school class an age group or a social category.
Similarly goups can be used in distributed computing systems
to help master the complexity of large applications or to help 
provide non-functional properties like availability or security.

The paper presents different broadcast algorithms to achieve 
global consistency in the system in the scenario of messages 
received and the order in which they are received. The failure
model considered in the system is initially the crash-failure model
 but it is later extended to partitioning of networks. Some recent
 publications however tend to model network failures as processor
 failures. 

 Broadcast protocols are better suited for the purpose because
 RPC is more suited for the client server model, and not one server
 and multple recipient model considered here.

 The performance of broadcast protocols are measured in terms of
 the latency, the amount of buffering needed at the servers and
 the number of messages needed to ensure the task is done.

 The paper starts of by considering a simple protocol which
 just ensures atomicity. Then the paper goes on to consider
 protocols in which not only atomicity is maintained but also
 ordering of messages. Total ordering is expensive , however
 present day research has shown that much faster totally ordered
 protocols are possible.
 ISIS implements CBAST which messages are causally ordered. Here
 past messages are piggy backed on current messages.
 The paper discusses ways to optimize the number of messages passed
 and also methods to reduce the buffer space requirements. The paper
 later discusses a protocol in which group membership information
 is maintained to take care of partitioning of networks and
 inconsistency brought forth by the constant joining and leaving
 of members.

 These protocols however fail in the context of Byzantine faults.
---------------------------------------------------
Return-Path: echris@merganser
Received: from merganser.cs.washington.edu (merganser.cs.washington.edu [128.95.2.192]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11145 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 08:33:53 -0800
Received: (echris@localhost) by merganser.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA16685; Tue, 21 Jan 1997 08:33:53 -0800 (PST)
Date: Tue, 21 Jan 1997 08:33:53 -0800 (PST)
Message-Id: <199701211633.IAA16685@merganser.cs.washington.edu>
From: E Christopher Lewis <echris@cs.washington.edu>
To: bershad@cs
Subject: Isis 552-reading summary 
Reply-To: echris@cs.washington.edu

This chapter provides an overview of reliable broadcast protocol issues
and implementations.  The authors suggest that there are two axes used to
characterize these protocols: failure tolerance and service guarantee.
Along the first axis, the "crash model," omission failures, Byzantine
failures, and unbounded delay "failures" are introduced.  The "crash
model" is assumed for the remainder of the chapter, and the authors
argue that it is a simple yet realizable abstraction.

On the second axis is service guarantee.  This is the primary axis
explored in the chapter, and services discussed include atomic
broadcast, ordered broadcast, FIFO (from a single source) broadcast,
causal ordered broadcast, broadcast with real-time guarantees (perhaps
its own axis), and broadcast to dynamically changing groups.  Each
service provides different guarantees, and a distributed application
developer needs to select a service appropriate for the application's
needs.

The topic that most interests (well, confuses) me received little ink in
this chapter: the interaction between the two axes.  In designing these
protocols, I doubt it is as simple as selecting a failure model and
assuming that abstraction can be implemented.  Given the fact it is
probably impossible to truly implement any reasonable failure model,
there must be some play between the abstraction and the protocol.
---------------------------------------------------
Return-Path: yasushi@silk
Received: from silk.cs.washington.edu (silk.cs.washington.edu [128.95.2.238]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11349 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 08:49:37 -0800
Received: (yasushi@localhost) by silk.cs.washington.edu (8.7.2/7.2ws+) id IAA09672; Tue, 21 Jan 1997 08:49:36 -0800 (PST)
Date: Tue, 21 Jan 1997 08:49:36 -0800 (PST)
From: yasushi@silk
Message-Id: <199701211649.IAA09672@silk.cs.washington.edu>
To: bershad@silk
Subject: Isis 552-reading summary 

This paper describes multicasting protocols with various ordering
guarantees. The atomic broadcast protocol ensures that all the
recipient members see same total ordering of messages. The base
version of this protocol requires two round trip between the sender
and every receiver. The causal broadcast is a weaker form of multicast
than atomic broadcast. It ensures the consistent ordering only between
causally related messages. This is implemented by piggypacking the
past messages with the new one. This protocol requires one round trip
per each message, but space and message size overheads can be
substantially reduced by assuming an upper limit on transmission
latency.  FIFO broadcast the weakest of all; it ensures the consistent
ordering among messages from same sender.  FIFO property is usually
automatically guaranteed by the transport protocol.

Real time multicast can be implemented by assuming the maximum
transmission latency. Finally, the paper describes the mechanism to
ensure the atomicity of the group membership changes. This is done by
treating the group state notification as a special kind of message,
and collectively guarantee the ordering such message and other kind of
messages.


---------------------------------------------------
Return-Path: matthai@franklin.cs.washington.edu
Received: from franklin.cs.washington.edu (franklin.cs.washington.edu [128.95.2.103]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA11454 for <bershad@whistler.cs.washington.edu>; Tue, 21 Jan 1997 08:59:55 -0800
Received: from localhost (localhost [127.0.0.1]) by franklin.cs.washington.edu (8.8.3+CSE/7.2ws+) with SMTP id IAA19365 for <bershad@cs>; Tue, 21 Jan 1997 08:59:55 -0800 (PST)
Message-Id: <199701211659.IAA19365@franklin.cs.washington.edu>
X-Mailer: exmh version 1.5.3 12/28/94
To: bershad@cs
Subject: Isis 552-reading summary
Reply-to: Matthai Philipose <matthai@cs.washington.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Tue, 21 Jan 1997 08:59:54 PST
From: Matthai Philipose <matthai@franklin.cs.washington.edu>

In distributed systems, the need may arise for one process
to broadcast a message to many others. Compared to the single
process-to-single-process case, at least two complications
may arise:
--The messages may not be ordered "consistently" at all processes.
--If the broadcast fails, all destination processes of the broadcast
    may not perceive the failure in a consistent manner.
Birman and Joseph describe four models of broadcast (atomic broadcast, 
ordered atomic broadcast (ABCAST), causally ordered brodcast (CBCAST) and 
Group Broadcast (GBCAST)) which have concrete, though different,
order guarantees and failure semantics. In describing the algorithms,
B & J make at least the following assumptions about the distributed
system:
--The underlying communication layer provides reliable point to point 
communication.
--Failures happen only in the form of processors crashing and stopping.
--Failures are detectible by other processors in the communicating group.
They indicate that more complex failure models can be transformed into
the above "crash model". Further, unreliability in the communication layer
may be modelled by more complex processor failure models.


Given the above assumptions, B & J go on to describe informally protocols
to implement the four broadcast models above. For each protocol, their 
strategy 
is to describe a simple, easy-to-prove-correct, but inefficient algorithm and
then to informally add modifications to the algorithm to make it more 
efficient.
They measure efficiency in terms of latency, volume of communication and amount
of storage required by these algorithms. Some of the strategies they use to 
make their 
algorithms faster are:
1)Assuming that failure is uncommon, and  pushing high-overhead operations to 
the
  case where failure actually happens.
2)Assuming that a given process probably initiates many  rounds of 
communication
  in rapid succession, and piggybacking control information related to one 
round of
  communication with the data for the next round.
3)Relaxing the consistency model used. For instance, in many problems, causal
  ordering may suffice instead of total ordering. If causal ordering is used,
  the common case of a processor sending a messsage to itself as part of a 
  broadcast may be optimized for latency.


It is plausible that using these primitives makes distributed programming 
easier.
Indeed, the difficulty of convincing myself that even the simplest of these
protocols (atomic broadcasts) does "what it is supposed to" convinces me that
rolling your own broadcast primitives and getting it correct may be very
difficult. On the other hand, I would like to see how naturally the primitives
presented in the articles map on to real applications. Even though the 
semantics
(both success and failure) of the primitive seem to be concrete, they are still
fairly subtle, and it could still take a lot of care to build a correct system
using them. Hopefully, the majority of distributed applications
either require simple applications of these primitives or do not require them 
at
all!