Tom Anderson

Robert E. Dinning Professor

Mike Dahlin and I recently completed the second edition of our undergraduate operating systems textbook, Operating Systems: Principles and Practice. The new version is available now from Barnes and Nobles, Amazon, and your local bookstore. Note that B&N often has the book in stock even if it is not in stock at Amazon.
Slides and selected code examples are available.

My research concerns the practical issues in constructing robust, secure, and efficient computer systems. I see myself as a generalist -- I am attracted to the biggest problem I can find, regardless of area. My most recent work has been in the areas of improving Internet availability and in constructing next-generation peer-to-peer systems. I've also done research in operating systems, distributed systems, software engineering, system security, file systems, computer architecture, and educational software. I haven't written a database, AI, or graphics paper yet, but give me time.

A recent focus of my work has been to develop ways to dramatically improve Internet availability and DoS resilience. This is essential to being able to use the Internet for critical applications such as real-time health monitoring and response. Despite massive investment by ISPs worldwide, Internet availability remains poor, with literally hundreds of outages occurring daily, even in North America and Europe. Some have suggested that addressing this problem requires a complete redesign of the Internet, but I argue that considerable progress can be made with a small set of mostly backwardly compatible changes to the existing Internet protocols. For example, many outages occur on a fine-grained time scale due to the convergence properties of BGP, the Internet's interdomain routing system. We have developed a set of additions to BGP that retain its structural properties, applying lessons from the fault tolerant distributed systems research community to dramatically improve BGP availability. Other outages are longer-lasting and occur due to complex interactions between router failures and router misconfiguration. We have recently built a tool called Reverse Traceroute to help localize problems. Localizing by itself isn't enough, so we are currently designing and evaluating a system to leverage Reverse Traceroute for automated repair of persistent Internet outages.

Another focus of my current research is to build the next generation of peer-to-peer (P2P) systems as a platform for scalable, resilient, and secure global scale applications. Over the long term, we will all become content producers, rather than just consumers. What should the platform be for widespread sharing of that content? Industry is moving to centralize computing and data, but that has significant downsides in terms of centralized ownership of data, and (I would argue) it will eventually lead to barriers to innovation. P2P provides an alternative to the cloud for sharing user-generated content, but achieving that goal poses a set of technical challenges in terms of performance, privacy, and resilience. As a starting point, we have built a high performance privacy preserving P2P system called OneSwarm, in active use by thousands of users, carrying several orders of magnitude more traffic than the more widely known Tor system.

Slides giving the motivation and preliminary results for these two projects: here and here.

I studied philosophy as an undergraduate, and so I am perhaps the only person in the world who had both John Rawls and Ed Lazowska on their thesis committee, although obviously not at the same time.