Title: Co-Designing Distributed Systems with Programmable Network Hardware

Advisor: Dan Ports

Supervisory Committee: Dan Ports (Chair), Sreeram Kannan (GSR, ECE), Arvind Krishnamurthy, and Magda Balazinska

Abstract:

The unprecedented scale and demand of today's datacenter applications present tremendous challenges to the design of distributed systems. These systems need to handle the immense and unpredictable user traffic, remain highly available despite failures, keep data strongly consistent, and meet stringent performance SLAs. Existing approaches, however, fall short in meeting these requirements: they require extensive server coordination to guarantee data consistency which leads to severe performance penalties, and they suffer from load imbalance in the presence of highly skewed workloads.

In this talk, I will discuss a new approach to designing distributed systems -- co-designing distributed systems with the datacenter network. Specifically, my work has taken advantage of new-generation programmable switches in datacenters to build several novel network-level primitives that offer strong guarantees. We then leveraged these primitives to enable more efficient protocol and system designs. I will describe three systems I built that demonstrate the benefit of this approach. The first two, Network-Ordered Paxos and Eris, virtually eliminate the coordination overhead in state machine replication and fault-tolerant distributed transactions, by relying on network sequencing primitives to consistently order user requests. The third, Pegasus, substantially improves the load balancing of a distributed storage system -- providing up to a 9x throughput improvement over existing solutions. To achieve this, Pegasus selectively replicates the most popular objects in the data store, and tracks and manages the location of replicated objects using an in-network coherence directory implemented in the switch data plane.

Place: 
CSE 303 (Allen Center)
When: 
Monday, August 5, 2019 - 10:30 to 12:30