CSE 461: Introduction to Computer-Communication Networks, Winter 2010
  CSE Home   About Us   Search   Contact Info 
 
Course Home
  Home
Administation
  Overview
  Using course email
  Email archive
  Anonymous feedback
  View feedback
 
Assignment Utilities
  Homework Turnin
  Assignment Wiki
  Gradebook
  Discussion: Protocol Bert
  Discussion: Protocol Ernie
  Discussion: Protocol Elmo
 
Most Everything
  Schedule
    Printer friendly version

Homework 4 - p2p flickr
Out:Wednesday February 17
Due: Per-section protocol RFCs: Thursday, February 25
Per-person RFC reports: Friday, February 26, 11:59PM
Per-team implementations: Wednesday March 10, 11:59PM
Implementation teams are encouraged but not required
Turnin:Online

Overview: Protocol Design / Networked Application Design

This large, final assignment is an opportunity to design and build a non-trivial networked application. Doing so necessarily involves designing protocols. This application has (at least) two. You'll also re-encounter ideas rom homeworks 0 and 1 (if not the code itself), and then amplify them into something truly great.

You can view the assignment as either building a peer-to-peer (P2P) version of flickr, or as building the Bayou P2P storage system. Either way, the idea is that you run an application on your machine. That application lets you add, remove, or replace photos that it manages (as well as add, remove, or edit comments on photos it contains). You can do that whether or not your machine has network connectivity; it's a purely local operation.

What the application provides is a shared view of the union of all the photos (and comments) everyone using that application has submitted. It does that by exchanging information between pairs of participants. For instance, if I've recently added a photo on my machine and my application instance talks to yours, they might notice that your machine was missing the new photo. My application instance would then transfer it to yours.

More specifically, a running instance of the application periodically looks around for other running instances of the same application. (This is called discovery.) When it finds one (or is itself found by one), each sends the other data (photos and comments) that the other is missing. This pairing of application instances is opportunistic; if you're running the application on your laptop while in the coffee shop, and someone else in that coffee shop is running an instance, the two may find each other and communicate. There are no central servers at well known addresses, you just communicate with whoever, whenever. (This general style of communication has a number of names: gossiping, epidemic, anti-entropy, and (perhaps) social networking among them. We'll use the term Bayou uses, anti-entropy.)

The major constraints on the application are:

  • eventual consistency: if updates were to stop, but opportunistic communication were to continue for long enough, eventually the local states (set of photos and comments) of all copies of the application would be the same.

  • disruption tolerant networking (DTN): There is no central server, and there is no time at which the all application instances are necessarily online, i.e., the connectivity is disrupted. Nonetheless, the application is useful always: all operations it provides are available to the local user all the time.
To achieve this, we have to live with the fact that, at any moment, the local copies of the data are only weakly consistent: they may have somewhat different data, and they may disagree about the ordering of some of the data (e.g., the order in which comments occurred).

Application What: P2P flickr

User Actions
The following must be supported. You can add additional operations if you like. The operations listed are not a specification.

  • add photo
  • replace | delete | recreate photo
    (recreate assigns to a photo that has been deleted)
  • add comment to photo
  • replace | delete comment
  • View photos / comments

Important: The assignment is to build a photo/comment sharing system that provides eventual consistency. All aspects of the design/implementation are up to you. What follows are specific suggestions. They are not intended to be complete. You will have to decide many things for yourselves, even if you decide to follow the suggestions.

Software Structure Overview
Logically, there are three parts to this application:

  1. Discovery
    The discovery component implements the discovery protocol. Designing the discovery protocol is a key part of this assignment.

    The server side of the discovery component waits around in some well-known place. When contacted, it provides the information needed to contact the anti-entropy component. (In slightly more detail: the discovery component is listening to a UDP port with a well known number. The anti-entropy component creates a TCP port with an ephemeral port number, registers the port number with the discovery component, and then waits for a connection on the TCP port.)

    The client side of the discovery component knows how to look for (i.e., discover) other running instances of the application, and so to obtain the information needed to connect to their anti-entropy components.

  2. Anti-entropy

    The anti-entropy component uses the anti-entropy protocol (the design of which is another key part of the assignment) to obtain updates from a peer.

    The server side of anti-entropy waits to be contacted by another instance. Once contacted, it figures out what data it has that the contacting instance doesn't, and transfers that data to it.

    The client side of anti-entropy uses the discovery protocol to find another running instance, contact it, and receive update information from it.

  3. UI
    The UI handles local operations: adding, replacing, and deleting photos; adding, editing, and deleting comments on photos; displaying photos and comments.
Here is a high level overview of the suggested implementation strategy. Rectangles represent indivdidual processes -- that is, execution of a single (possibly multi-threaded) program. The disk and cube icons represent data: an sqlite3 database and directories on the local machine, respectively. The happy face represents the user.

p2p flickr is the focus of the assignment -- the implementation of the discovery and anti-entropy protocols. That component has no UI. Logically, it runs as a daemon, although it's easier to just start it manually and kill it when you're done.

Persistent Data: sqlite3

The database holds persistent data (data that survives between runs of the program). This includes information about the photos that have been registered (the photos themselves are kept as files in the .../images directory.), the comments on them, and anything else required. sqlite3 is particularly handy for this - it's just a library, so requires no setup or administration.

We provide sample code that shows how to use sqlite3 from Java. This page provides an overview of the use of SQL databases in Java, and is the best reasonably short, reasonably comprehensive overview of SQL databases that I found. Note that SQL is only a quasi-standard; different implementations may do things slightly differently, or may not implement all features. Consult the sqlite3 documentation if you run into inexplicable problems. Note also that between your Java code and the sqlite3 engine is the JDBC, Java's unfathomably clunky attempt to standardize access to SQL databases across different SQL implementations (e.g., sqlite3, mySQL, SQLServer, Oracle, Postgres, ...). Despite all that, I expect you should have at most minor problems using sqlite3, even if you've never used a database before, especially if you ask for help if you get stuck.

Since multiple pieces of code interact with the database, it would be a very good idea to write a layer that abstracts the its details from the rest of the code. That is, instead of embedding SQL queries here and there in the code to add a photo to the database, put that SQL into a library method and call it where needed. That way if you need to change the database design (the tables and/or their fields, and possibly even what the field values mean), you can do so easily. Additionally, the library makes it possible for people not familiar with SQL to contribute code at full speed.

The UI

In part as an aid to debugging and testing, there is a UI to interact with the current data state. The suggested UI has two parts. We use a browser (e.g., Firefox) to view the current state of the data. The 'html page gen' component reads the state of the database and creates html pages that display it in some handy form. A sample implementation is provided. The resulting interface looks like this. Note that there is no web server involved - the browser is simply displaying local files.

The input side of the UI has to handle things like additions, replacement, and deletion of photos and comments. We suggest this be implemented as a Java program with a command line interface. (Note that it can't be implemented using the viewer/browser interface because there is no web server behind the pages being viewed.)

One function of the UI is to help you debug: it's a handy way to view and manipulate the local data. The sample code suggests a minimum level of functionality for the viewer component. Beyond that, there are no particular requirements for the UI. There is more than one way to design and to structure its implementation, and many enhancements that you'll probably find desirable and that require only a bit of work to implement. If it were me in this situation, I would for sure build a UI like the one just described.

The diagram above shows the html page generator as a separate program. The sample code is, of course, a separate program. However, it's primarily the implementation of a class that knows how to generate the html pages that compose the sample viewer interface, given proper invocations of its methods. You probably want to invoke html page generation whenever the database state is changed, i.e., after acting as the client in an anti-entropy session or after handing some user input command. In that case, the simplest way is to use the page generation class as part of the p2p flickr and updater implementations; there doesn't have to be a distinct page generation program. (Such a program would undoubtedly be handy to have, though, even if page generation is also integrated into other components.)

p2p flickr

Here's a bit more detail on the main component:

The figure is meant to suggest the following:
  • It would probably be a good idea to implement a Discovery layer (the blue rectangles), with a clean interface, and an Anti-entropy layer (the yellow), with a clean interface. The interfaces abstract the specific data sent (or received) into more meaningful operations. For example, Discovery methods for its server side might be void waitForDiscoveryRequest() and void sendAEConnectionInfo(InetSocketAddress aeServer). (The "AE" in the latter refers to anti-entropy.) A tiny bit of code, implemented in the p2p flickr app, would glue these together to implement the discovery daemon. As a rough rule of thumb, you'd expect a layer to export a method for each "logical operation" supported by the protocol, and the p2p flickr code to use these to do its job. This insulates the p2p flickr code from changes to details of the protocol (e.g., encoding), something that will probably take place at least a few times.

  • The suggested implementation has two threads. One is the Discovery server thread (shown in blue). It simply waits for some other instance to contact it, provides some response, and then waits some more. The other thread (in red) is doing two logically distinct things: it's acting as both the client and server sides of the anti-entropy protocol. The reason to have one thread do both is to preclude unfortunate race conditions, as you might have if separate threads were reading/writing the database at possibly the same time. (It actually shouldn't be very hard to use two threads, it's just that it's easier to get something wrong in a way that will be hard to debug.)

    For this single thread scheme to work, the thread can never block forever trying to read from a socket. So, you must use the versions of reads that time out if no data is received after an interval you specify. (Note that this would be required for at least some reads in any case, because you can't count on the remote end working, and you don't want a broken remote instance to cripple your instance.)

Application How: Bayou

Bayou was a research project at Xerox PARC during the early-mid 1990's. It provides a solution for the technical problems faced in creating a disruption tolerant application that ensures eventual consistency. I suggest you read up through Section 2 of this paper, plus Section 4.3. (If you can't view postscript files conveniently, a local pdf version of the paper is here.)

What follows is an overview of the parts of Bayou we're taking, plus a few other issues not addressed or that we're doing differently.

All changes to the state of the database are called writes. All writes are kept in a log, sortable by the order in which they should be applied. In that sense, the current contents of the database are simply a cached result of executing the writes in the log.

Each application instance keeps a kind of logical clock. It's helpful for these clocks to be in rough synchrony. For operations initiated locally, the clock therefore works like this:

new clock = max( current time of day clock, old clock + 1 )
That is, the clock ticks at the resolution of the time of day clock when nothing is going on, and always ticks fast enough to provide unique timestamps to distinct events. This scheme isn't required for correctness. It's useful to get ordering about right while instances are only weakly consistent. (Note that "time of day clock" is a phrase meaning something like java.util.Date.)

Additionally, the logical clock respects causal ordering. When an instance learns of an event with timestamp T assigned by any other instance, it updates its own clock:

new clock = max( T+1, old clock )

One way the local database is updated is by actions taken by the local user. When the local user causes a local write, the write is recorded as an entry in the local log, including the originator of the write (i.e., the id of the local instance) and the timestamp of the write (as assigned by its originator). The local database is updated as well. (Note: the log is implemented as a database table, as it must be persistent across runs. Despite that, we use the term "database" in this document to mean all data other than the log. Entries in the log are never deleted, in our implementation.)

When two instances engage in anti-entropy, the updates from the server side are transferred to the client side. Those updates are entered in the client's log. The database is then reconstituted by (a) emptying it entirely (except for the log, of course, and one other exception noted under "Naming" below), and then (b) applying writes from the log in timestamp order. When that is done, the local database contains all locally originated writes up to the current local timestamp, plus all of the writes that originated on the server up to the highest timestamp of any such write conveyed during anti-entropy. The client remembers these two timestamps, and which application instance they are associated with. If it subsequently acts as the server side of an anti-entropy connection, it can convey both writes it originated and remotely originated ones it learned about through anti-entropy. Eventually, all instances learn about updates from everywhere.

To help with anti-entropy, each instance keeps a version vector: a vector that gives the highest timestamp update it knows about for each instance it has ever heard of. When anti-entropy takes place, the client sends its vector to the server. The server compares that vector to its own and sends whatever updates the server has that the client does not (those with timestamps higher than the client's version vector value for the originator of the update).

For that to work, it's essential that every instance always have every update with timestamp less than or equal to the value in its version vector, for each origination instance. This must be true even if failures occur. To achieve that, a few things are necessary. First, servers in anti-entropy must convey writes in timestamp order. Second, the anti-entropy client must not update its version vector until after it is sure the write is recorded permanently (i.e., is in its log). Third, if the write names a file (e.g., an image file), it must not record the write until it is sure the file has been correctly created.

Application Implementation Notes

Differences from Bayou

All our writes are single operations, either assignments (e.g., creation of a photo, creation of a comment) or deletes. Even editing is assignment; e.g., editing a comment is simply assigning the comment a new value. We therefore don't need to think of writes as sets of updates, and we don't need dependence checks or merge procedures: the value of an object is the last assignment or delete on it in the ordered execution of the log.

We are not worried about being able to truncate the logs, ever. A lot of Bayou mechanism (past Section 2) is therefore not required.

Application instances can be created, but not deleted. That could allow some simplification relative to the instance creation and deletion scheme in Section 4.3 of the Bayou paper.

Bayou imposes a globally unique ordering on writes by having some single, designated instance be the sole authority on ordering. We'll instead order by the timestamp assigned to the writes by the originating instance, breaking ties in any way that guarantees the same result is obtained at all instances.

Naming

You'll need to be able to name things. For example, to add a comment to a photo, the code has to be able to specify which photo. To later edit that comment, the code needs to be able to name the comment.

Naming is perhaps a tiny bit trickier than you might at first imagine. Our main requirement is that an application instance can create a name that is guaranteed to be globally unique (i.e., a name no other instance might also create), even when the instance is completely offline. (A method to do that probably occurs to you right away, based on earlier course material.) It turns out that there's a generally used scheme to solve this problem, known as Universally Unique IDs (UUIDs). Java provides an implementation, and a sample program distributed with the homework shows how to use it.

UUIDs are the names of things, for example, as stored in the database. For photos, you also need a local name per instance, in particular, the name of the local file holding the image data. When a photo is entered into p2p flickr through the UI, you create a UUID for it, make a copy of the local file in some directory that only the p2p flickr application manipulates, and store the copy's local filename in the database. These local names have no global meaning, though, and are never communicated from one instance to another. When anti-entropy is performed, the global name of any new photo is provided to the client, and then the raw data of the image is sent. The client receiving the data makes up its own local file name to store the data in. (A sample program shows how to generate a new, unique, local filename from Java when running on a Unix system. Method createTempFile() of Java's File class may also be able to do the same thing, in a system independent way.)

You should store the association between a photo's global (UUID) and local (file path) names in a database table used only for that purpose. That table is NOT emptied before replaying the log after an anti-entropy exchange. (Since replaying the log does not allow you to recapture the image data bytes, you have to remember where they are.)

You should store the local (file path) names of photo data bytes in log entries.

Because instances are allowed to be created, but never deleted, instance names can be UUIDs as well (and not the recursive names used in Section 4.3 of the Bayou paper). We also don't need to include instance creation as a write operation entered into the logs. A client learns of the existence of a new instance by hearing about writes that originated at it during an anti-entropy session.

Error Handling

Your implementation should tolerate fail-stop errors of remote instances. That means that a remote instance can go down completely at any point whatsoever and the application should still have the eventual consistency property. ("Fail-stop" means the only failure mode is stopping. In particular, a failure won't cause a remote node to start sending bogus data.) Additionally, you can assume that there are no attackers of the system; all instances are trying to do the right thing, not trying to break things.

At the same time, both remote and your own instances are likely to have bugs. You should try to protect yourself against them, especially when they could cause critical problems. (For instance, suppose a remote instance has a bug that causes it to incorrectly transfer image files. If that goes unnoticed the corrupted files will be propogated around the system. Recovering could be difficult.) It's a matter of judgement what is worth checking, and how much redundancy to design into your protocols to allow checking. (Usually the amount that minimizes development time is slightly more than I've actually done, in my experience.) Expect bugs, especially at first, and remember that you're dealing with global, often offline, state.

You can assume that TCP is reliable. (The remote client might die, or simply send bogus data, but TCP itself won't corrupt transmissions.)

You can assume that the local system is just as reliable as you've always assumed it to be. (That means that you don't need to go to extraordinary measures to check that a write to a file succeeded, for instance; you can just assume it has. This is to keep the programming effort down for this assignment; a real implementation would want much more evidence that things had actually worked.)

IP Broadcast

IP broadcast packets will not traverse routers. You therefore won't be able to discover instances other than those running on the same network as your instance.

To Be 100% Clear

This is not TOMCast. The various instances are only weakly consistent. What an instance displays right now is everything it knows about, but it may not know about everything. No two instances may have exactly the same state at any moment. Additionally, at least logically, most clients are offline most of the time.

Activities

Per-Section Protocol Design: 2/18-2/25
We'll design two versions of the discovery and anti-entropy protocols, one per section of the course. We'll begin in sections on 2/18. We'll use some of sections on 2/25 to resolve any lingering issues, since it's a time that everyone is guaranteed to be available to meet.

Each section (or group) will produce a short specification in the form of "RFC." Pick a handful of people to write notes to capture the current state of the design. Keep it short, likely a page or two. We will provide a way to put these online so that each group can see the RFCs and continue to define and improve them.

Your section should produce RFCs as follows:

  1. Decide on the operations needed to support the discovery and anti-entropy protocols. (A sub-component of anti-entropy is file transfer, which is different enough that it might be considered a third piece.)
  2. For each operation, describe the parameters with which it is invoked and the messages it causes to be exchanged across the network between instances (e.g., a request and a response? how are errors indicated?).
  3. Give the formats for each message or other information that is exchanged.
  4. Describe how these messages are sent using the services of TCP or UDP.

Keep in mind that these designs are only about the network portion of the application. They should be kept short and be used for interoperability. They should not specify details that are up to implementers, like how often to try to discover nearby instances, or with which instance to engage in anti-entropy. There will likely be multiple versions as you fill in or change details, so you should consider what will happen when the version changes. Another tip is that formatting of messages is often the source of much complexity. Prefer simple formats that are easy to implement in Java and easy to debug. Do not worry about bandwidth efficiency; most bytes transferred are likely to be image bytes in any case.

Per-Person Protocol Evaluation: 2/26
Hand in a short (less than one page), personal, evaluation of the protocol designed by your section. You should identify two things:

  • what you think is the strongest positive about it, and why.
  • what you guess will be the biggest problem with it, why, and whether or not you guess that aspect of the protocol will have been changed by the time the assignment is finished.
You can think of these as "minority reports" about the protocol design. They're due end of Friday, February 26. PDF files are preferred.

Application Implementation: 2/18-3/10
You're encouraged to work in teams of 2-3 people. You'll implement the p2p flickr application using the protocols designed by your section. A goal is that all implementations using the same version of the protocol should work with each other. Note that your team can usefully begin working before the protocol RFC is finalized, at least in the sense of discussing overall design, agreeing on build tools (e.g., will you use make, ant, or eclipse? what editor tab settings will you use? will you use a source control system? which one? how will you communicate and track bugs? etc.), and coming up with lists of initial responsibilities.

Your team should begin by designing interfaces for the discovery and anti-entropy protocol implementations. You should also design the database (what tables exist and what data goes into them), and possibly a database abstraction layer. You should do this during the week following the initial protocol design in sections on 2/18. That way you'll uncover potential protocol problems and be able to suggest improvements before protocols are "finalized" on 2/25.

Most of the pieces of the application can be developed and debugged largely independently. For instance, work on the anti-entropy portion can begin before the discovery portion is done by having you (the human user) provide the IP address and port of a remote instance to contact. The database can be seeded with information using the sqlite3 command line interface. The database abstraction layer, assuming you're writing one, is needed widely, so should have high priority early on. Design and development of other components can be done while the abstraction layer is being debugged, though, once you have an interface specification. Stub components replacing the abstraction layer may be useful to allow debugging of other components during database abstraction layer debugging as well.

I would aim to have all components "integrated" by Wednesday, 3/3. By integrated I mean that components are using the actual libraries you've written, not stubs, even if those libraries are still somewhat buggy. Among other things, integrating will cause you to uncover bugs faster. Additionally, integration always reveals miscommunications about the interface specifications between the library user and its implementor, and you'll need time to work those out.

For turnin, hand in all of your code, preferably as a tar file. Also hand in a short report listing the members of your team, and answering the questions given in the next section.

Implementation Report Sections/Questions

Hand in one report per team. Clearly indicate who the team members are, and which protocol suite (i.e., section) you implemented.
  1. Is it possible for an instance to learn of a comment being applied to an image for which it has not received the creation write? If yes, what did you do about that? If no, explain why not, and also explain what you did if the situation arose even though it should be impossible (if all implementations were bug-free).
  2. Explain the strategy used by your implementation to find remote instances, and to choose one to contact for anti-entropy.
  3. Give one example of a decision that is significant from the point of view of how well the application works but which is not (and should not be) part of the protocol specifications. (Do not use the question asked above, how remote instances are chosen for anti-entropy, to answer this question.)
  4. The Internet Robustness principle in RFC 791 says "Be conservative in what you do; be liberal in what you accept from others". It makes protocols less fragile. Give one example of how you followed this principle in your implementation. (You might learn this when you don't interoperate with someone else when you both think you have it right!)
  5. Did you implement any "features" beyond those explicitly described in this document? (That includes taking an entirely different implementation approach.) If so, describe what you did.
  6. Provide a short user's guide, appropriate for the course staff, making precise (a) how to build your application, and (b) how to run the components of your application, i.e., how to launch the p2p flickr component and how to run both sides of the UI, including how to add and delete photos and comments (if that isn't obvious after launch). (You submission must build and run on a standard Java installation, so needs to contain any exotic jar's or other files it might require. attu is the definition of "standard java installation.")
  7. Does your implementation work? If not, describe what isn't working and what you think the fix might be.

What You Get

There is no starter code. You're writing things from scratch. However, you have the following resources:
  • Homework 0 - performed file transfers (but has no real protocol).
  • Homework 1 - experience with logical clocks. Also note that package util from that homework might have code useful in this project (especially the Logger, but possibly also the networking utilities).
  • Course material on DHCP - one component of DHCP is discovery.
  • hw4.tar.gz - some sample code.
    The tar file expands into a directory subtree. One of the directories is named samples.

    samples/java contains:

    • guid.java - make a UUID in java.
    • DB.java and sqlitejdbc-v056.jar - shows sqlite3 database creation and querying from java. To run:
      java -cp .:sqlitejdbc-v056.jar DB
    • jmkTemp.java - create a unique file name in Java when running on a *nix machine.
    • HTMLPageGen/HTMLPageGen.java - a class that produces the html pages for the viewer side of the UI. You must stitch it into your code (i.e., create an object and invoke it with arguments fetched from your database) to complete its integration with your implementation. To run as a sample, compile and execute in the source directory:
      java -cp . HTMLPageGen
      Output html files appear in .../samples/htdocs/. Open file index.html in your browser to view. (The pages have been tested only with Firefox 3.5.7. There's no known reason why they shouldn't work with others...)

      Note: some text files required by HTMLPageGen.java are included in the directory with it.

    samples/images contains some sample photos.

    samples/htdocs is the output directory for HTMLPageGen.java.

The tar file also creates directories images, htdocs, and java (as siblings to samples). These are the intended destinations for production copies of things.


Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]