CSEP552 Spring 2013 -- Assignment #3

Implement a P2P client

Out: Wednesday, May 22nd, 2013
Due: Monday, June 10th, 2013, by 5pm

[ overview | protocol description | what to do | what to turn in ]

Overview

In this assignment, you will implement a simple "unstructured" P2P node, execute it to connect it to a course network, and collect textual payloads from your classmates' nodes. The P2P protocol we are using is called CSEtella, and it is loosely based on Gnutella, one of the original fully decentralized P2P file-sharing systems. Instead of building a full file-sharing client, your CSEtella node will host a single, short text block; it shares this text block via a "reply" message, as described below.

Because your P2P node will act both as a client and as a server, other nodes will need to be able to connect to it. This means that if you run your node behind a NAT proxy, home router, or firewall, you will need to figure out how to poke the appropriate hole in it to allow incoming connections to reach your node. You'll also have to figure out the appropriate IP address to advertise from your node. For example, if your node is running behind a home router, you should advertise the IP address of your home network, rather than the non-routable IP address assiged to your node itself.

If you're unsure what this means, we recommend that you instead run your node on one of the attu.cs.washington.edu computers. Pick a random, large port number (e.g., in the range 20,000-40,000) so that you don't accidentally collide with a classmate.

CSEtella protocol description

CSEtella is a fully decentralized system. What this means is that every node acts like a client, a server, and a router. A node behaves as a client, in that it establishes connections to other nodes, and it sends "ping" and "query" messages to other nodes. A node behaves as a server, in that it accepts connections from other nodes, and it replies to ping and query messages from other nodes by sending back "pong" and "reply" messages, respectively. As well, a node behaves as a router, in that it routes ping, query, pong, and reply messages between nodes.

message
description
ping A ping message is sent by a node to discover other nodes on the network. When a node receives a ping message, it is expected to route the ping message to other nodes (i.e., it participates in a flood of the ping), and it is also expected to respond to the ping with a pong message.
pong A pong message is a response to a ping message. A pong includes the IP address and port number of the responding node. In a sense, then, a pong message advertises the responding node's availability to other nodes in the network. Each pong message is routed back to the node that initially flooded the ping; in other words, pong messages are not flooded.
query A query message is sent by a node to ask other nodes to share their text block. When a node receives a query message, it is expected to route the query message to other nodes (i.e., it participates in a flood of the query), and it is also expected to respond to the query with a reply message.
reply A reply message is a response to a query message. A reply message includes a short text block shared by the responding node. The reply message is routed back to the node that initially flooded the query message; in other words, reply messages are not flooded.

Note that TCP connections in CSEtella are "persistent." This means that many messages will flow in each direction over a single TCP connection: you should not be disconnecting after sending or receiving a single message.

Once launched, your CSEtella node should proceed in the following manner:

  1. Your node should create a listening socket so that other nodes can establish TCP connections to it.

  2. Your node should establish a TCP connection to the IP address and port number of at least one known CSEtella node. Note that Steve will attempt to maintain a CSEtella node on IP address 128.208.2.88 and port number 5002.

  3. Once it has established a connection to another node, your node should begin periodically sending ping messages on all peer connections. The other node will respond with a pong, and also flood the ping onwards through the CSEtella network, causing other nodes to respond with pongs as well. As a result, your node will receive many pong messages back from a single ping. Each pong message informs your node of another peer node that it can choose to connect to, if it wants.

  4. Your node should establish a handful of connections to other nodes, to improve the robustness of the overall P2P network. The specific number of connections you try to maintain over time is up to you.

  5. Periodically, your node should send a query message out to each of its connected nodes. The query message will be flooded out over the CSEtella network, causing a stream of reply messages to be routed back to your node. Your node should record the text block embedded in each reply message, since part of your turn-in will be a list of unique text blocks that you have harvested.

  6. Your node should be willing to accept incoming connections from other nodes. (Other nodes will discover your node's IP address and port number from the pong and reply messages that your node sends.)

Each CSEtella message shares a common wire format: a message header followed by an optional payload. Ping and query messages have no payload; they consist only of the message header. Pong and reply messages have a payload; they consist of a message header followed by a type-specific payload. All fields in messages are in network (big endian) byte order.


Message header



Ping

A ping message consists simply of a message header with its type field set to 0x00, and the payload_length field set to 0x00000000.


Pong


A pong message contains a message header followed by a six-byte-long payload; the pong payload is as shown above. Note that the message_ID of a pong message should be set to the same value as the message_ID of the ping message that caused the pong to be generated. This enables pong messages to be routed back towards the peer that originated the ping. Also note that the type field of a pong message should be set to 0x01.


Query

A query message consists simply of a message header with its type field set to 0x02, and the payload_length field set to 0x00000000.


Reply


A reply message contains a message header followed by a 6+(text block length) byte long payload; the reply payload is as shown above. Note that the message_ID of a reply message should be set to the same value as the message_ID of the query message that caused the reply to be generated. This enables reply messages to be routed back towards the peer that originated the query. Also note that the type field of a reply message should be set to 0x03.


Message routing

The peer-to-peer nature of CSEtella requires nodes to route network traffic appropriately. A well-behaved CSEtella node will route messages according to the following rules:

  1. Pong messages may only be sent along the same path that carried the incoming ping message. This ensures that only those nodes that routed the ping message will see the pong message in response. A node that receives a pong message with message_ID = N, but has not seen a ping message with message_ID = N, should remove the pong message from the network.

  2. Reply messages may only be sent along the same path that carried the incoming query message. This ensures that only those nodes that routed the query message will see the reply message in response. A node that receives a reply message with message_ID = N, but has not seen a query message with message_ID = N, should remove the reply message from the network.

  3. A node will forward incoming ping and query messages to all of its directly connected peers, except the peer that delivered the incoming ping or query message.

  4. A node will decrement a message's TTL field and increment its hops field before forwarding the message to any directly connected peer. If, after decrementing the TTL field, the TTL field is found to be zero, the message is not forwarded along any connection.

  5. If a node receives a ping or query message with the same descriptor ID as one that it has received before, it should avoid forwarding that message to any peer. (This implies your node will need to keep a cache of recently observed message headers.)


The image above shows an example of how an incoming ping is routed by a node, how a ping causes pong messages to be generated, and how the pongs are routed back along the paths that the pings took.

What to do

Implement a CSEtella peer, using any language, runtime, and operating system of your choice. We recommend that you first test your peer against itself -- i.e., run several copies of your peer and connect them to each other in a little private network. Once you think you have that working, go ahead and connect to the course network. As a reminder, Steve will (attempt to) keep a peer up and running on IP address 128.208.2.88 and port number 5002. You can use this to bootstrap your node: by routing ping messages through Steve's node, you will learn about other nodes on the network, and you can connect your peer to some of them.

Leave your peer running for as long as you can -- once you have it working, leave it up and running until after the assignment is due. Your goals should include the following:

Optional task #1

Build a CSEtella crawler. Your crawler should use a ping message with TTL 1 to discover the immediate neighbors of a node. Then, it should connect to each neighbor, and send a ping with a TTL of 2 to discover their neighbors. Continue until you've discovered all of the nodes and connections in the network. (How will you know when you're done?)

Use a graph drawing program like graphviz to visualize the structure of the network.

Optional task #2

Build a web server front end to your CSEtella peer. Your web server can report on a number of properties of your peer:

What to turn in

You should turn in the following:

Use the course dropbox to turn in your source code and your writeup.