|
|
|
|
Homework 5
Out: Monday March 2
Due: Monday March 9, 11:59PM
Penalty-free Extension: Friday March 13, 6:00PM
Turnin: Online
HW5 Wiki: Arriving shortly. (Anyone can create it...)
Javadoc for hw5 classes: It's here.
Assignment Goals
This final assignment brings us back around to the beginning of the quarter:
we're once again building an application. The specific goals are:
- Get some experience using TCP: reliable, stream-oriented communiation.
TCP underlies most popular, deployed protocols (e.g., HTTP for web pages).
- Get a chance to experience design once again. This time we're designing
an appliation level protocol:
message formats, and to some extent message exchange sequences.
- Learn about a distributed agreement protocol: two-phase commit.
- A collateral benefit: experience the joys of using a database, instead
of files, and particularly the management-free joy of using
SQLite.
(The assignment is designed so that if you really don't want to know anything
about this, you don't have to.)
Teams
We'll work on this assignment in teams.
Your team should have two people on it.
If you feel you must, your team can have one person on it.
If you feel it will help you understand the material in this assignment,
it can also have three people on it.
The goal of teams for this assignment is all about having someone to
talk with about the issues. The workload hasn't been specifically
designed to break up into two pieces, and in any case working mainly
separately is just the opposite of what I hope will happen. There is
enough going on here that talking will help. (Even if you're working
alone, try explaining carefully what you're doing to an imaginary
teammate. It'll help.)
Once you have a team, please send mail to the entire course staff
indicating who is on your team. (It would be helpful if the subject
line included "hw5 team".) We'll set up a Unix group that includes the
members of your team, and a file system area (accessible from the attu's)
where your shared files can reside. I recommend using a source control
system, but it's up to you.
(I like cvs; many other people seem to like subversion.)
We can help with information if you'd like to use source control but don't
remember how to set things up and find the online information less than clear,
CSE 303 being by now a distant memory.
Abstract Motivation
This is a description of a kind of system one might consider building.
The actual assignment is the central kernel of this application.
We are by no means building the application described here, though.
Alice, Bob, and Charlie (and Dave and etc.)
would like to be able to schedule meetings with
each other. Ideally, each should be able to create a meeting and enter
it into the calendars of everyone invited.
Unfortunately,
Alice, Bob, and Charlie all use different calendaring sytems. For instance,
Alice uses Google calendars, Bob uses Yahoo calendars, and Charlie uses
Exchange.
To achieve the shared scheduling function, we imagine building "an overlay" -
a distributed implementation that sits on top of the individual calendaring
programs each participant uses. Each participant runs a process that
accepts meeting scheduling requests from other processes in the overlay.
Each such process determines whether or not the proposed meeting time is free,
and if so enters the meeting in the underlying calendaring system.
There are therefore two parts to the this overlay: communicating with other
overlay processes to exchange meeting information, and translating the
scheduling events communicated by the overlay into commands for the per-participant
underlying calendar system.
Our goal is to build this overlay.
What We're Actually Doing
In a nutshell, we're building the communication that allows
joint scheduling of meetings. We're not building the code that
interfaces to heterogeneous, real, calendaring systems. Instead,
we're using a simple database as the calendar backend.
We begin with some skeleton code (in Java). The code is both incomplete
and incorrect, as will be explained. Our goal is to complete and correct
it.
The skeleton code implements a distributed application. Each participant
(Alice, Bob, etc.) runs a copy, on whatever machine s/he wants.
Each participant can create meeting requests. A meeting request contains:
- A meeting time. To simplify our lives, a meeting time is a "slot" -
the infinite future has been divided into a finite number of slots. Each
meeting occupies exactly one slot. No two meetings involving any participant
can happen in the same slot. Slots are named with integers: 0, 1, ..., N-1.
- A meeting description: an arbitrary string.
- A list of participants ('alice', 'bob', etc.).
In the skeletal code, meeting requests are created at random: in a random slot,
and involving randomly selected participants. The skeleton code insists that
all meetings involve the same number of participants, but that number can
be changed easily from run to run.
Suppose Alice creates a meeting request. She first checks to see that she
is free during the (randomly chosen) meeting slot. If so, she communicates
the meeting request to the other (randomly chosen) participants. If they're
all free, they should all end up with the meeting on their calendars.
If anyone isn't free, the meeting request is simply dropped - there is no
attempt made to find a time when everyone is free.
Each participant has a calendar. The calendar is actually implemented
as a simple SQLite database. A
Java class (class DB ) wraps access to the database, so you don't have to
understand anything about it to complete this assignment.
Nonetheless, here are a few details. The database has two tables.
One, calendar , has fields for the meeting slot and
description: each meeting occupies a row of that table. Another
table, participant , has fields slot and user name. There
is one row for each participant in the meeting taking place during
that slot. The two tables can be "join'ed": given a row
in calendar , the database system can find all rows
in participant that have the same slot time as
the calendar row. Joining gives us the list of
participants. (The calendar table also has a 'status'
field, and the DB class supports its use. It's function will be clear
once you've read the sections about the protocol.)
This application is fully built, with the exception of the incomplete
and incorrect parts, explained next.
The Incomplete Part
The Java implementation includes a class, Message , that
is used to communicate information from one participant to
another. Message is primarily about data representation:
how do I encode what I'm trying to say so that the receiver can decode
and understand it?
Almost none of the Message class code is in the skeleton. Your
second job is to implement it. Your first job is to decide how to do
the encoding required.
We've looked at this problem repeatedly throughout the course. With
the exception of material from last week's section (HTTP and a mail
protocol), the solution has always been packets composed of a fixed
header plus a payload, e.g., Ethernet, 802.11, IP, and TCP. It would
be natural to try to use those protocols as guides in solving this
problem. But, you should be aware of some differences that might lead
you to a different sort of solution:
- This application is built on top of TCP, i.e., on top of a
connection-oriented, reliable transport. Thus, we're not trying to
communicate the same kinds of things as those lower layer
protocols. For instance, we don't need to name the destination (TCP
has handled that), and we don't need to use sequence numbers (TCP has
handled that). What we do have to do is communicate what "kind" of
message is being conveyed, and what additional information it carries.
For example, one kind of message might ask whether a particular
meeting can be scheduled (i.e., the slot is free). We need to
indicate that that's the kind of message we're sending, and what the
meeting is.
- Your should value robustness over efficiency. By that I mean you
want to design a scheme you can implement and debug in short order;
you don't care very much if it's as compact (low overhead) as
possible. Robustness means, among other things, that mistakes
encoding messages don't crash the process receiving and decoding them.
As you'll see, a receiver is always free to simply ignore any incoming
message it doesn't understand - that ends up being equivalent to
claiming to be busy during the meeting time.
It's not unlikely that you'll change your mind about what has to be
encoded, and how to encode it, as you complete this work. A robust
design is usually simple. Being simple will make it easier for you to
make changes. Value simplicity. Try to decouple decoding from
encoding, so that little changes in design don't mean big code
changes.
Once you're read a bit more about the protocol being used (below), you'll know
what kinds of information you have to be able to send and receive. With that,
you can design your message formats. Implementing your format should involve
only a single source file, message/Message.java .
The Incorrect Part: The Protocol
I'll start by explaining the protocol that is implemented. Then I'll explain
why it doesn't work.
Here's a picture showing how the protocol operates when Alice generates
a meeting request:
Alice generates a meeting. If she can make the proposed time,
she offers the meeting to Bob. If Bob can make it, he writes it
on his calendar and replies "commit" to Alice. (If he can't make it,
he replies "abort" to Alice.)
If Bob can make it, Alice poses the same question to Charlie.
When Charlie says he can make the meeting, Alice enters it on her schedule.
Our goal is the following: if any participant ends up with the schedule
on their calendar, all participants must end up with the meeting on their schedules.
It should be obvious that this protocol doesn't achieve that:
if Charlie replied "abort," Bob would have the meeting scheduled but Alice and
Charlie wouldn't.
You're probably already thinking that this protocol was silly - Alice should ask
everyone if they can make the meeting, collect the responses, and then tell everyone
to schedule the meeting if all participants have agreed they can make it.
Unforunately, that doesn't work either. The simplest problem with it is that
while Bob may have been free when first asked, by the time Alice gets around to
confirming that Bob should write the meeting on his schedule, he may have already
written some other meeting in the same time slot.
Now before you start thinking about how to fix that, let me mention
another, more subtle, problem. Consider this simpler situation, in
which the meeting involves only Alice and Bob. This exchange follows
the "ask everyone first, then commit" rule (or is at least equivalent to
the more direct implementation of that rule, that involves more message
exchanges).
The problem here is that Alice might crash between sending the message
to Bob and receiving the reply. In that case, Bob has the meeting scheduled,
but Alice doesn't. There is no simple fix for this: no ordering of messages
and committing to the database can work when machines can crash at arbitrary moments.
(You should take a moment to consider some orderings, and why they don't work.)
Two Phase Commit
Two phase commit is a distributed consensus protocol: a protocol that results in
all participants agreeing on an outcome. In this case, the outcome is either to
commit the meeting or to abort it. The description of two phase commit
here is tailored to our use; additional information is available everywhere, as
this is an extremely important protocol.
For our use, two phase commit relies on a proposed meeting having
three possible states, with respect to the database. Two of them are
what we had before: not in the database and commited in the database.
(A committed meeting is one that is scheduled. It can never be
removed.) The third state I'll call precommit. The meeting is in the
database, so the slot appears busy, but the entry does not indicate
that the meeting is scheduled. Instead, it means that more of the
protocol needs to run before we know whether to commit the meeting or
to remove the precommitted version of it. The precommited meeting is
put in the database because, by assumption, that way it survives crashes: if a
machine goes down, when it comes back up it can find the precommitted
meetings and do something about them (explained in a while).
Here's a picture of two phase commit:
Alice precommits the meeting. As she asks each participant, they
precommit the meeting as well, and then respond to Alice. In all
cases, this prevents another meeting from being scheduled in that
slot. (We don't mind that this meeting might eventually be aborted,
while the other one could have succeeded. It's okay to err on the
side of refusing meeting times.) When everyone agrees to the time,
Alice commits the meeting in her database. She then tells Bob and
Charlie to commit as well.
Now let's consider what happens in other cases, including crashes.
We'll consider just a few cases. You should think about others as
well.
Suppose some participant replies 'abort' to the initial message.
Alice then sends aborts to everyone, and they all remove the
precomitted entry from the database.
Suppose Alice crashes at any time before commiting in her database.
In this case, the meeting will never be commited. Bob and Charlie
will notice the precommitted entry and repeatedly ask Alice what the
status should be (using timeouts to resend requests). When Alice
comes up, she'll find the precommitted entry (created by her), and
delete it. When she gets an enquiry from Bob or Charlie, she'll say
she doesn't have that meeting on her calendar. That will cause
Bob and Charlie to delete their precommitted versions.
In contrast, suppose Alice stays up throughout, but Bob crashes after
sending his commit back to Alice. When Bob recovers, he'll find a
precomitted meeting originated by Alice. He'll then ask Alice whether
or not the meeting is scheduled. If Alice has it on her schedule,
she'll reply commit; otherwise she'll reply abort.
Suppose Bob is down when Alice tries to schedule. In this case, Alice
times out waiting for a reply from Bob, and the time out is considered
an abort response from Bob.
Note: We will not actually cause any crashes in running our code.
We will also not bother to implement any recovery code that needs to be
executed after a crash has taken place.
What we will do is implement the message exchange sequence shown above,
when there are no crashes. (We'll also allow participants to reply abort, of course,
and do the right thing in that case.)
Recapping the Tasks
- Design and implement a message encoding scheme.
- Change the message exchange sequence to conform to two phase commit when
there are no crashes. (Don't try to implement any code related to recovering
from crashes.)
- Write a brief report explaining your message encoding and how you
implemented two phase commit. Grading will be primarily (and in most cases
exclusively) based on the report.
Hand in your code and report online. There is one submission per team.
Make sure to clearly identify the team members in your report.
Due Date(s)
This is comfortably doable by next Wednesday. Unfortunately, next Wednesday
is the second mid-term. If you start immediately, and don't run into any
serious unexpected problems, and two phase commit makes sense to you,
then being done by Monday, 3/9, should also be a comfortable schedule.
Otherwise, there should be no problem starting now, working up to Monday,
then spending your 461 time studying for the midterm, and then finishing
by Friday, 3/13 6:00PM.
As in previous assignments, there is not very much code to write; it isn't
about coding. On the other hand, there is some understanding required before
you can do the writing.
In all cases, you should make sure you understand what to do,
and how you're going to do it, by Monday, 3/9,
even if you don't finish the coding and/or writeup by then. The second midterm
is 3/11.
Skeletal Code Issues
Implementing your message encoding should happen entirely within
message/Message.java . That class supports three kinds
of messages: scheduling requests, commits, and aborts.
Remember that you're transmitting messages over TCP, and that TCP is
stream oriented. This means that TCP does not provide any indication of
where a message ends; your encoding has to do that.
It goes without saying that some encodings are easier to implement in
Java than others. Try to pick one that is easy.
The protocol changes happen in calendarapp/CalendarClient.java .
That program is similar to ChatApp from hw2. In particular,
it is multithreaded. One thread, the one that entered at
main() , sits in a loop generating new, random messages,
and then engaging in the role of Alice in the pictures above (i.e.,
sending meeting requests, collecting responses, etc.).
A second thread sits around reading meeting requests sent by other
clients. Because this is TCP, what actually happens is that the
second thread is waiting for connections from other clients. When it
gets one, a new socket is generated. The second thread creates a new
thread each time this happens, handing that new thread the socket that
was just created. That new thread then acts as Bob in the pictures
above.
The skeletal code implements timeouts on reads: if no response is received
after some parameterizable amount of time, an exception is thrown and the
thread stops waiting for input and instead executes the catch clause
handling the exception. We're unlikely to see actual crashes during
execution. However, incomplete or buggy code can result in no response
being received, so these timeouts can occur. Also, it's possible to
run a test for a world in which Alice, Bob, and Charlie exist, but one or
more of them isn't currently running (e.g., you run clients for Alice and
Bob, but not Charlie). You should handle this just fine. In this
case, though, the code will be unable to connect to Charlie - it's not
a read timeout that happens, it's a failure to establish a TCP connection.
(This is also an exception, also handled in the skeletal code.)
Obtaining the Skeletal Source
- Download hw5.tar.gz.
- Expand the file you just fetched. If your browser hasn't
already let you do that, use
tar xzf hw5.tar.gz .
A new subdirectory,
hw5 , will be created.
- You now have the source. To create javadoc for the source:
make jdoc The javadoc is also
available online.
Getting Ready
The code you'll be working on is Java, and so you might think it would
run on pretty much anything.
That would be naively optimistic.
Additionally, there is a set of build and test tools,
much like the ones used in hw2. These have been developed on Linux,
without much regard to Windows portability (because it's hopeless,
basically). Everything has been tested on the attu's. The default
assumption is that you'll be working on the attu's.
All that said, the attu's aren't configured enough to run either
the Java application or the infrastructure tools (described next),
at least without some help. So, do this:
- Once you have untar'ed the source, cd into directory
javasqlite .
- Don't bother to read the
README.txt
- Execute
build.sh ('./build.sh').
- Now 'cd ../perlSqlite'
- And then './build.sh'
You need to do this setup only once, on each machine you'll use
for either running or editing code. (The attus are one machine in
the sense of that sentence. If the lab Linux boxes mount the same
home directory for you as the attus do, they're also likely to be part of
the attu "one machine.")
This might work on other Linux systems, depending on what else is installed
on them. (We need perl and sqlite3 and probably things I don't know we need.)
It also might possibly work on Macs; it's very likely it could be made to
work on Macs.
Note that these builds have to be done on the machine the code will run on.
(The attus, and possibly lab Linux boxes, are "one machine" for this purpose.)
You can't build on attu and then copy all the resulting files to your personal
machine, for instance.
Build, Execution, and Test Environments
The build/test environment is a lot like that for hw2, but with
some improvements. As with hw2, life will be a lot more pleasant
if you have ssh keys set up.
Here are the basics:
- To build, just say
make .
- There is no lobby in this application.
Instead, the file
clientlist.txt is a static list
of clients. Look in that file for format. Remember to change
the port numbers to something you choose, that others aren't likely
to have chosen, and that aren't in the restricted port range (below
1024 or thereabouts) or out of range for ports (above 64K).
This file places no restrictions on how many clients you run.
As it comes out of the box, it is set up for three.
There are also no restrictions on how many clients can run on a single machine.
The only restictions on what machines you can run clients on are:
(a) you must be able to compile and execute the code on those machines,
(b) the machines must have hostnames resolvable by DNS (or at least routable
IP addresses), and (c) it must
be possible to initiate a TCP connection with the host from all other
machines running clients (i.e., think NATing).
- That file is used to indicate where clients should be, not to start
up clients. To start clients:
- Open up shell windows, one for each client you want to start.
Note that you don't have to ssh into attu in these shells.
- Now execute './become.pl alice' (or bob or charlie or whatever).
The become script will now wait for an indication that client alice should
run. When it gets that indication, it will ssh to the machine that file
clientlist.txt indicates alice should run on, and execute the
calendar application there.
- Start up other clients in other windows, using the become.pl script.
Note that you don't have to start a client for each client
in
clientlist.txt . If you don't run charlie, for
instance, no meetings that involve him will be committed, but the
application will run just fine without him around.
- In some other window, './go.pl' will start up all the clients.
- './go.pl clean' will kill hung java executions (much like in hw2)
- The clients write log files as they execute. './go.pl printlog alice'
will print the one written by alice, for instance.
- './go.pl printcal alice' will print the database (calendar) of meetings
alice created, for instance. (Alternatively, you can execute
'./printcal.pl alice' if you're in a directory that has file alice.db - that's the file SQLite uses to hold Alice's calendar database.)
- To customize execution parameters, edit file
go.pl .
Near the top you'll find assignments to $MEETINGSIZE, $NUMTRIALS, and
$SENDDELAY. These are the number of participants in each randomly created
meeting proposal, the number of meetings to create before terminating, and the
time to sleep between successive meeting proposal creations.
(To customize the number, names, and execution locations of clients,
edit clientlist.txt .)
Note that all commands now work when executed on a local Linux box (if
you have one) that has the skeletal files installed: no need to ssh
into the attus. Of course, the attus still need to have the
executable code installed on them, so the easiest thing for you might
be just to ssh into attu and do all work there.
For advanced users who want to work mainly on a personal
machine. In developing the skeleton, I did the initial setup
described above on attu, then untar'ed the skeletal distribution on my
home Linux machine as well. After this initial setup on attu, I edited all
files (Java source, plus control files
like clientlist.txt ) locally - this is a lot faster
for me than editing on attu. Doing a 'make
remote' causes the updated local files to be copied to attu
and compiled. For this to work for you, you'll have to do a tiny bit
of editing in the makefile - there are some paths (e.g., REMOTEDIR,
near the top of the makefile) that need changing. It'll be obvious
how to change them... (Note that this scheme works even if you can't
do a complete setup locally. All you need are the untar'ed source and
control files, plus a working install of perl and make. It also works
if your local machine is behind a NAT box, as mine is.)
It should also be possible to run entirely locally - all clients, plus
the ./become.pl invocations, plus the ./go.pl invocations on a single
machine - assuming the setup procedures succeed on that machine.
File clientlist.txt has instructions on how to specify
local execution of the clients. This is an untested feature.
Understanding Results
You can use './go.pl printcal alice' to print
alice's calendar after she's done executing. You can also
print Bob's and Charlie's calendars. Comparing them is a manual
process.
The basic notion of
correctness is that if any client has a meeting scheduled
(committed), then all partipants have that meeting scheduled.
There are some corner cases, though. Because client termination
isn't synchronized, it's possible for some client to permanently
"crash" at the end of execution, as seen by other not-yet-terminated
clients. This could leave it with
uncommitted meetings that are committed by other clients.
Such meetings should have precommitted status, though.
It might be possible to automate comparing the calendars of
different clients. At the moment there is no tool
for doing that.
Additional Remarks
Some of you may have heard of iCalendar , a
popular calendaring
format
and protocol. I'm
mentioning it only for completeness. I don't think it will be a help
on this assignment to look at anything about it (and I certainly
wouldn't try to build it). Nothing stops you from having a quick
glance, though, if you're so inclined.
Two phase commit, while an important protocol, has some important
drawbacks. Those have led to... you guessed it,
three phase commit.
I bring this up just in case it's handy to drop names in an interview
(or because you might be interested in these protocols!).
Note that most of the material on the web about both two phase
and three phase commit is explained in a transaction processing (database)
context that might seem a bit unfamiliar. My expectation is that
you shouldn't have to read any outside materials about 2PC to complete
this assignment, and I'm sure you don't have to read anything about 3PC.
Finally, I've bastardized 2PC a bit, to make it simpler but sufficient
for our needs. In particular, I'm not very reluctant to decide not to
schedule a meeting. A real implementation is a bit more complicated,
and could result in more meetings being scheduled. If you're
interested in the details, I highly recommend taking the database
course. Databases, and distributed transactions, are decades old
ideas, but have recently become central to a big swath of
applications. (I also recommend SQLite, rather than files, whenever
you're thinking of using files. The database course will explain
the SQL query language to you as well.)
|