CSE/EE 461: Introduction to Computer-Communication Networks, Autumn 2007

CSE/EE 461: Introduction to Computer-Communication Networks, Autumn 2007

CSE Home

About Us

Search

Contact Info

Course Home
	Home

Administation
	Overview
	Using course email
	Email archive
	Anonymous feedback
	View feedback
	Homework Turnin
Most Everything
	Schedule
Information
	UW/ACM Tutorials

Homework 6: Routing
Out: Wednesday November 7
Due: Monday November 19

Additional Files

batchRun.pl - 11/14/2007
Runs a specified number of simulations and plots conditional distribution functions (cdf's) and computes means and variances for Score, Average Reachable Fraction, and Average Available Fraction. The arguments are the number of simulations followed by the "run" arguments, e.g.,
$ ./batchRun.pl 100 200 1 40 50
$ ./batchRun.pl 100 1 testGraphFile
The first invocation will run 100 simulations over 200 time steps, each with 1 initial failure, 40 nodes, and 50 edges. The second invocation will run 100 simulations using the topology specified in testGraphFile. The Score cdf is written to simscore.(pdf|ps), and the Average Reachable Fraction and Average Available Fraction cdf's are written to simcdf.(pdf|ps). Means and variances are written to standard output.
batchRun.pl needs to be executed from the same directory as "run", and you will need the following installed on your system:
- perl
- ps2pdf
- jgraph

Version History

V1.2.6 - 11/8/2007
An update that merely removes a vestigial field from the LogEntry class. (No executable lines of code were changed. The only functional impact of this update is that, maybe, you can run very slightly larger simulations before running out of memory.)
V1.2.5 - 11/7/2007
Original distribution.

Overview

This assignment is about the link state and distance vector approaches to maintaining routing tables, and in particular their transient behavior when subjected to router failure and recovery and their ability to scale to large networks. As an extra, no cost bonus, you'll also get experience with one of the difficulties of trying to work on Internet-scale systems: their size makes simulating them at full scale at least painful, and in any practical sense impossible.
In spirit, this assignment is similar to the RFID assignment: you begin with a base implementation, which works badly, and fix the structural problem it has by writing a tiny bit of code. You then try to find an approach that works as well as possible. The base implementation you start from is a form of distance vector. As part of "your tuning" you may end up implementing link state as an alternative.
The main deliverable for the assignment is a report, described in more detail below, summarizing what you learned thinking about how to maintain (suffiently) consistent distributed state (i.e., the routing tables), and how system size affects the approach you might take. You will also hand in any code you change, but there is a good chance we won't ever look at it.

Transient Behavior and Scaling Properties

It is reasonably straightforward to argue (and understand) that both distance vector and link state converge to a consistent set of shortest path routing tables, given enough time during which nothing changes - no failures and no additions. What happens when there are changes, though?
This graph shows two measures of the quality of the routing tables in a network with 10 routers ("nodes") and 14 links ("edges") that is subject to node failure and recovery. The network was allowed to run with no changes to the point that the routing tables had converged. The graph starts at the moment when a two nodes have failed. (Router failures and recoveries occur randomly after that.) I'll define the performance measures graphed more carefully in a moment, but for now it's enough to know that bigger is better -- a value of 1.0 represents a fully converged, failure-free network.
Here is a graph for a network with 400 nodes and 500 edges, beginning at a moment when one node has failed. Obviously, things are not as good as in the smaller system. It is possible to make them better, but only at the cost of devoting more of the network's capacity to the overhead of maintaining the routing tables.
The trade-off between that overhead and the achieved performance is the engineering design task. As an idealization, we want a scheme that converges instantaneously, and consumes no resources to do so. Of course, that's impossible. Your goal is to develop a scheme that, in your opinion, best trades off the delay to converge against the on-going overhead required to do so, and achieves that over as large a range of network sizes as possible. To help evaluate alternatives you might think of, we have defined a measure of "goodness" - each run of the simulator is given a score, and your goal is to find a scheme that, on the whole, maximizes that score. (The score metric will be defined in detail in a moment.)

Simulator Overview

The simulator has been written to try to minimize the amount of code you have to look at to accomplish your task. Of course, that's a nice idea, but there are sometimes good reasons to look beyond the small bit that actually needs changing. This brief overview is a road map to the overall structure. Extensive javadoc documentation provides additional detail, the course staff is prepared to answer questions at all levels of detail, and of course the source itself is fully available to you.
The simulator implements a "slotted time" model. Roughly, each time step each router is given a chance to run the routing table maintenance procedure. That involves reading any update messsages it may have received, updating its routing table if necessary, and possibly sending update messages to neighbor routers. Messages sent during one time step are not available for reading by the destination node until the next time step.
Because any actual network is actually much more asynchronous than the slotted time simulator, a bit of asynchrony is inserted by randomly skipping each node's update during each time step - with high probabilty the update call occurs, but it might not. Any messages queued for the node remain queued until it next runs (so long as it doesn't fail first).
The simulation starts at a time when all routing tables are completely correct. A number of nodes given by an invocation argument are then failed, and the simulation begins. At that point any node that is up fails with some probability each time step, and any down recovers with some probability.
A node learns of the failure of a neighbor through a timeout mechanism - if no message is received from a neighbor for enough consecutive time steps the node concludes the neighbor is down. The neighbor may actually be down, or it might just a "false positive" - the neighbor might just have been slow (skipped a few time steps) and is actually still up.
At the end of each time step the simulator analyzes both the actual state of network (using "oracle" knowledge of which nodes really are up and which are down). It then compares the actual state of the network to that given by the routing tables. Various metrics are calculated based on this comparison (described below). At the end of the simulation these values are averaged (across all time steps) and then an overall score and some of the more detailed components of it are printed.
The implementation consists of the following significant pieces:

class sim
This is where the mainline is. It is basically in charge of driving the simulation -- reading command line arguments, constructing the network, stepping it forward, analyzing the network each step and keeping track of the score, and printing results when done.

class Network
The network object knows about the all the nodes and edges in the graph. The simulator invokes it each time step, and it then iterates over all the nodes, invoking their time step method.

class Node
This is an abstract class; some concrete class must be subclassed from it.
Each node has an "address," which is called its UID in the code. UIDs are assigned consecutively, starting at 0.
Each node has a routing table. Unrealistically, to simplify everything, a node magically knows how many other nodes there are in the network. (That is, it doesn't have to learn this, as it would in a real system, it just asks the network object and magically gets the information.) One thing this simplifies is that the routing tables can just be arrays of a fixed size.
Nodes provide the routing table maintenance logic. They exchange messages with each other, and process the ones they've received. They also detect that a neighbor has gone down using a timeout. The node class itself contains just the basic control flow for that process. The implementation of a specific algorithm goes into a subclass.
Note that a Node is allowed to send a message ONLY to one of its neighbors. The code has been implemented to discourage the natural mistake of trying to send a message to an arbitrary destination in one time step.

class DVNode extends Node
This class is two things: an example of how to extend the node class, and an implementation of a simple distance vector scheme for maintaining routing tables. As mentioned previously, the scheme has a bug, and part of the assignment is to find it. (Another part is to show that you've found it by fixing it.) I'll explain the distance vector scheme used in a bit.

class DVNodeFactory
Well, it's a DVNode factory - a class that knows how to produce DVNode objects. This code pattern is required by the package I'm using, as well as being a good idea on its own. If you create a new routing table maintenance algorithm, and a new subclass of Node, you'll need to produce a new factory class as well. Look at this one. It's about one line of code.

class UpdatePacket
A superclass for all messages sent between nodes. It does pretty much nothing; we just need a common class for structures that hold these messages. It does contain the UID of the sender.

class DVUpdatePacket
The information in a message is depends on what the scheme being used is. This class carries the information needed by the DVNodes. If you create a new routing table maintenance algorithm, and a new Node subclass, you'll want a new UpdatePacket subclass as well.

class RoutingTable
It's primarily an array of (nextHop, distance) pairs, indexed by the desintation's UID, plus a bunch of methods you might want to access and modify those entries. Additionally, it can keep a log of all modifications made to its entries, which is key to the DVNode scheme (so more on that later).

class ReachabilityChecker
A class that can examine both the true network state and the state of router tables, and can compute some of the needed metrics.

class ScoreKeeper
A class used to accumulate metric values across time steps, and to compute the score function

class ConstXXX
There are three ConstXXX.java files/classes. These simply hold symbolic constants. There are three to distinguish the kinds of uses of the symbolic constants:

ConstConfig.java
Constants that affect how the code compiles/runs. One constant turns on/off some slightly verbose printing, useful for debugging. Another two let you choose to either get different random numbers each invocation of the program, or the same stream each invocation. The second option is useful for debugging, the first for experimenting with how well a scheme is doing. The final one turns on/off writing a data file that can be used post-run to generate a graph, like this. You should set these however you like.
The default settings cause "repeatable random number generation," meaning you'll get only one random network generated for each network size. You'll want to look at more random random networks, to get an idea of statistics like average performance, so you'll definitely want to change that setting at some point.

ConstSysProperties.java
These are properties of the network: the node mean times to failure and recovery, the relative cost of an update packet (in the scoring function), and the probabilty that a node will be skipped during a time step.
These are not things that can be used to tune your algorithm. You can change them from run to run, though, to get an idea of how robust your scheme is to different network conditions, but that's all.
When/if we run your code, we might well change these values from the ones in the distributed copy of this file.

ConstAlg.java
These are symbolic constants that reflect parameters of the routing table maintenance procedure. They would be part of the router software in a real system, so you are free to adjust them in an attempt to tune your application (and to introduce new ones, of course).

class MyRandom
A simple subclass of Random that helps provide repeatable random number sequences.

The DVNode Scheme

In the classic distance vector scheme, each node sends its distance to every destination to each of its neighbors. Neighbors then maintain their own routing tables by adding one to the minimum distance of their neighbors' routing table entries for each destination, and setting the next hop to the router whose table contains the minimum.
That approach has two implications. The first is a property of the system being simulated: big messages are being passed around. If it were possible in Java to do so, the scoring function would penalize you for those big messages. (It is in fact possible; I did implement it. The penalty of measuring message sizes was a mere factor of 10 slowdown in the overall speed of the simulator, though, so I took it out.)
The second problem is a problem for the simulator's implementation. There are something like 4E routing-table-sized data structures in the simulator, where E is the number of links in the network - each router has its own routing table; the other endpoint of every link has a cached copy as well; links are bi-directional. I wanted to be able to simulate as large networks as possible, and running out of memory is the main constraint, so having so many full copies of routing tables lying around didn't seem like such a good idea.
To remedy these problems, I "improved" the distance vector algorithm a bit. First, instead of sending full routing tables around, each node sends only those entries that it has changed since the last time it sent an update packet. (These changed entries are the "log" that the RoutingTable class knows how to create.) This results in each node sending much less data to its neighbors - just those destination distances that have changed since its last message - which reduces message sizes, for which we're rewarded in spirit even if Java gets in the way of rewarding us in the score.
Second, because I can't afford to keep a copy of each neighbor's routing table (for simulator memory reasons), I don't. If I hear an updated distance from a neighbor to some destination, I first check if it's a better route to the destination than the one currently recorded in my routing table. If so, I change the routing table entry accordingly. If not, I check to see if I'm using that neighbor as the next hop to that destination. If I am, I update my routing table to reflect the new distance, keeping the same neighbor as the next hop. I don't know at this point that there isn't now a better route through some other neighbor, since I haven't kept their information around, but I expect to get update messages from them pretty soon, and so will have a chance to change my mind (about which neighbor is best for the next hop) when I do.
That's pretty much it.
As shown in this graph there is some problem with this scheme, or at least with its specific implementation in the base code.

The DVNode Implementation

Class Node handles most all of the boilerplate control flow. It also defines a set of six "callbacks" that its subclasses, including DVNode, must implement. Three of them are part of the basic time step loop. The subclass gets a call when its turn comes up to take a step. This call happens before anything else, and lets it do any one-time processing that might be required to set up its environment. Another callback is then invoked repeatedly, once per update message in the node's incoming queue of messages. A single update message is passed as an argument in each of these calls. Finally, when the update message queue is exhausted, a final callback allows the subclass to do any post-processing/cleanup that might be required.
In addition to those three callbacks, there are three more, having to do with failure and recovery events: one to indicate that this node itself has just gone down; one to indicate that this node has just come back up; and one to indicate that a neighbor hasn't been heard from in so long that we've concluded it has crashed.
All of this is implemented in DVNode.java, which in theory is all you should have to read or modify to fix the base implementation. It is also pretty much all you need to figure out how to implement an entirely different scheme (for example, link state), with the addition of DVNodeFactory.java and DVUpdatepacket.java. It wouldn't be too surprising if slightly wider reading of code and javadoc is necessary for this second purpose, though, although it really shouldn't be much more.

What To Do

You will hand in a report, plus any code that you've written or modified. The report should address the following.

Identify what is wrong with the DVNode scheme and/or implementation, and demonstrate that you have indeed correctly identified it by implementing a fix and showing that the fixed version is fixed.

Try to find a scheme that is both effective and robust. By effective I mean that it achieves a high score. By robust I mean that it does so over as broad a range of system parameters -- number of nodes and edges, and the parameters in ConstSysProperties, say - as possible. The two are inter-related; you can get a higher score for some particular network by tuning for it alone, but you're then likely to do much more poorly for networks of other sizes or failure rates, say.
This effort is the bulk of your report. Tell us about each thing you tried, why you tried it, and how you went about trying it. Your main goal is to explain why it was worth your time to try the things you did, rather than some alternative. No one has the time to try every possibility, pretty much ever, so you want to be sure to spend what time you have on those things most likely to bring the biggest returns. This should mean that when you consider both the time it will take to try something and the amount of gain you think it's likely to provide, it seems attractive to do so. "It was easy to try and I had no idea what the gains would be" is justifiable, because the cost of trying is so small. "Even though it was hard to try, I was pretty sure that the gains were large, for these reasons..." is also justifiable. "I had no idea how long it would take nor whether it could even conceivably be of any use" is hard to justify.
The emphasis is truly on this process. Someone else in the class may well stumble upon a scheme that gets a higher score than yours, but have done so in a way that is the equivalent of a retirement plan based on winning the lottery. While in life you're rewarded for dumb luck, and penalized for its opposite, in university you don't have to be - a good idea, with well thought out motivation, that doesn't work out for unexpected reasons is just fine with us. (An idea that clearly wasn't going to work even before trying it isn't, though.)

One natural idea is to use a link state based protocol in place of a distance vector one, for reasons explained in class and the text. As a special extra credit bonus, worth up to the equivalent of 5 midterm points, you can implement a link state protocol, obtain simulation results for it, and constrast its strengths and weaknesses with the fixed DVNode approach. This offer is open to you only after you've made some reasonable attempt to try other, less time consuming approaches to obtaining an efficient, robust algorithm, though.
The bonus points are also available without implementing. In this case, though, you have to: (a) tell us precisely what your scheme is, in quite high detail, and (b) use some alternative (to experimentation) to support the claim that it would do well if implemented. The burden will be on your to convince us that your scheme works well, not on us to convince you that it doesn't. Partial bonus credit is possible.

Your report should explicitly discuss the scalability of your final approach. What (most) limits its ability to run effectively on larger and larger systems?

The Metrics

The score is computed by averaging the following value, across all time steps:

the fraction of routing table entries that are "correct." An entry is correct if it names a next hop, and if forwarding a packet to that next hop would result in the packet eventually reaching the destination, assuming that the tables didn't change while it was in flight. It is also correct if the entry indicates there is no path to the destination, and in fact there isn't one (because actual failures have partitioned the network).

Subtract from that the fraction of routing table entries that are incorrect.

Now subtract from that some constant C times the average number of update packets each node sent every other node (not just neighbor nodes, but really every other node). The constant C is the symbolic constant ConstSysProperties.UPDATE_PACKET_OVERHEAD_WEIGHT, and is set to 0.01 by default.

The graphs show two metrics, which I've called availability and reachability. Availability is the average fraction of all destinations that can be reached from each router, including ones that are down (and so can't reach any). The idea is to assume that there are an equal number of users on the networks behind each router, and that they send traffic uniformly to all desinations. The metric indicates what fraction of it will be delivered.
Reachability is defined as the ratio of the count of the number of router table entries, across all tables in the network, that indicate that a packet should be forwarded and are correct (the packet will arrive, given the router table at the next and every subsequent hop) to the sum across all routers of the number of destinations reachable from that router. That is, reachability measures how good a job the routing tables have done of simply getting a packet to its destination, given that it is possible to do so. Reachability does not penalize actual partitioning of the network due to node failures. Availability does.

Building and Running the Code

You need some minimally recent version of Java. I'm not 100% sure what, but I'm guessing 1.5. The current version is 1.6, which is what I used. 1.6 is installed on attu.
You'll know that your version is too old if you get compiler errors on the base code, and vice versa. Alternatively,
[attu4] ~> java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Server VM (build 1.6.0_03-b05, mixed mode)
To build, you can use the bash script called build:
$ ./build
Or, you can read the script to find out what you need to know to use any alternative build process.
To run, use the run script (or read it and then use whatever you'd like). Run simplifies very slightly the direct invocation of the simulator, which is:
Usage: java -Xmx300m -cp '.;jgrapht-jdk1.5.jar' sim nsteps nfail nnodes nedges
       java -Xmx300m -cp '.;jgrapht-jdk1.5.jar' sim nsteps nfail graphFileName

The first creates a random topology, the second uses the one
described in the file.

        nsteps is the number of timesteps to run.
        nfail is the number of nodes to fail.
        nnodes is the number of nodes in the network.
        nedges is the number of edges in the network.
        graphFileName is the name of a file containing a description
          of the network topology: #nodes on a single line followed by
          pairs of integers representing edges (i.e., the IDs of the
          two nodes, where IDs are assigned consecutively starting
          at zero).
The run script contains the invocation line up through 'sim'. You supply the arguments that come after that, e.g.,
$ run 2000 1 400 500
$ run 2000 1 testGraphFile
The first invocation causes the simulator to randomly create a network with 400 nodes and 500 edges. The second causes it to use the topology in the named file, which is a chain.
The run script also fixes one of the incompatibility problems between running cygwin on Windows and running on real Linux: the ';' in the command line is required for Windows, but is an error on Linux. On Linux, a ':' is required, but that's an error on Windows. The run script figures out what system it's running on and punctuates correctly.
To create graphs, you can use the createGraph.pl perl script:
$ ./createGraph.pl
This produces simgraph.ps and simgraph.pdf versions of the graph.
For this to work, some more things must be installed on your system:

perl
ps2pdf
This jgraph.
I put a (x86 Linux) copy of the jgraph executable at attu:/cse/courses/cse461/07au/jgraph.
If jgraph is not on your PATH, you must make a simple, well commented edit to what will probably look like createGraph.pl's first executable line.

Turn-In Procedures

Online turnin.

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]