Readings Page for Isis

Click here for the paper. (if no paper appears, then it's because no paper is available in an online form).

Background information

Tons of material on ISIS is available but most of its hard to get. Birman's running TOCS these days and is trying to set a good example by not putting published papers on line.

I don't have the main paper for the ISIS reading available on-line yet. But, a pretty good paper describing ISIS's Process Group Approach can be found here.

Another paper of interest is one published by Skeen and Cheriton in the 93 SOSP. They complain about CATOCS. The complaints are interesting. Paper is here. My slides on this are here.

(Message inbox:521)
Replied: Wed, 07 Sep 1994 07:06:45 PDT
Replied: birman@dag.uni-sb.de
Return-Path: birman@dag.uni-sb.de 
Delivery-Date: Tue, 06 Sep 1994 22:35:50 PDT
Return-Path: birman@dag.uni-sb.de
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.6.8/7.2ws+) with ESMTP id WAA22344 for ; Tue, 6 Sep 1994 22:35:47 -0700
From: birman@dag.uni-sb.de
Received: from uni-sb.de (uni-sb.de [134.96.7.230]) by june.cs.washington.edu (8.6.9/7.2ju) with SMTP id WAA08714 for ; Tue, 6 Sep 1994 22:35:45 -0700
Organization: Universitaet des Saarlandes
              D-66041 Saarbruecken, Germany
Received: from octavie.dag.uni-sb.de with SMTP
          by uni-sb.de (5.65++/UniSB-2.2/940830)
          id AA09123; Wed, 7 Sep 94 07:35:36 +0200
Received: by octavie.dag.uni-sb.de with SMTP; Wed, 7 Sep 94 07:35:38 +0200
Date: Wed, 7 Sep 94 07:35:22 +0200
Message-Id: <9409070535.AA02695@angelina.dag.uni-sb.de>
Received: by angelina.dag.uni-sb.de; Wed, 7 Sep 94 07:35:22 +0200
To: bershad@cs.washington.edu
Subject: Re:  recent article in IEEE Computer (Or was it CACM?)

Brian,

With all the copyright issues I am generally a little careful about
leaving things online.  But the paper was in the Dec. 1993 CACM and
is reprinted in the book RVR and I put out (with the figure fixed).
I am very far offline right now, at Dagstuhl, but in fact I did leave
the .ps for this file on ftp.cs.cornell.edu in the pub area, either
called TR-93-xxxx.ps.Z or perhaps in some sensible looking symbolic
link to that name, probably in the pub/isis area.  The paper itself
was called "The process group approach to reliable distributed computing"
and I agree that it made more sense than most Isis papers prior to it...

If you have a student hunt for the paper they should get the TR number
from the Cornell TR list in the home page and if WWW cooperates the
paper may still be online through the web.  But if not, and if they
note the TR number, the file is probably just lying there.  It might
be TR91 or TR92 (92-1296?) but with that screwed up figure I think
I convinced myself to leave the TR copy around for people who don't
like to cut and paste.  The CACM copy itself has a version of Figure
4 that the CACM people decided to "correct and simplify" for me.  Bless
their hearts...

Ken



(Message inbox:3226)
To: cs552@cs
Subject: The CATOCS  Response
Date: Thu, 23 Jan 1997 10:36:13 PST
From: Brian Bershad 



I mentioned the response paper from the Isis group to the paper I presented today.
I'll have handouts of the OSR paper made.
Robbert Van Renesse (from the ISIS group) has this to add.


------- Forwarded Message

Date:    Thu, 23 Jan 1997 12:23:26 -0500
From:    Robbert VanRenesse 
To:      bershad@cs.washington.edu
Subject: Re: tounge in cheek response to Cheriton's SOSP paper.

In retrospect, I think both our responses were wrong.  The right response
is really quite simple:

*) the end-to-end argument provides safety in the face of at most a single
   failure in an unreplicated setting:  in case of the famous file server
   example it makes sure that the file is stored uncorruptedly on the
   server's disk.  It does not deal with liveness--if the server is dead,
   no file is written.  Two failures can cause a correct checksum to be
   sent to the client with a corrupted file on disk.

*) using replication and voting, no end-to-end argument is required.  Not
   only does it tolerate t failures with 2t + 1 replicas, it does both
   safety and liveness.  However, you need to make sure that updates are
   done in the same order at all replicas.

*) if you use active (non-centralized) replication, you need a total ordering
   protocol of some kind.  Two-phase commit + two-phase locking will give you
   this, but it is unnecessarily complex and blocks more often than necessary.

*) if you use passive replication (primary backup), you need to make sure
   when you roll-over from a crashed primary to the backup, that updates
   from the old primary are not delayed beyond updates from the new one.
   This is exactly the causal ordering requirement.

Could you please forward this to your students, in addition to the response
that I wrote.  By the way, both my and Cooper's (and Ken's) responses were
published in ACM SIGOPS OSR vol 28 no 1 (Jan 94).  Another response by Santosh
Shrivastava was published in OSR vol 28 no 4 (Oct 94).

Slides

Postscript or HTML

Note: some of the dept's UNIX machines won't display this postscript. It will print and it can be viewed from any NT machine however.

Your comments

Please submit your comments on this paper by sending electronic mail:
To: bershad@cs
Subject: Isis 552-reading summary
(I suggest you just yank the above subject line to make sure there's no problem).

Everyone's comments

You can also read every one else's comments. (This provides an incentive to be both informative and well-written.)

Click here to see what other people in the class had to say about the paper.