Today's software is broken and unreliable. Software failures pose a significant cost to our economy --- NIST estimates around $60B per year. In addition to financial cost, software failures have more tangible costs in the real world, like the large-scale blackout in the northeastern US in 2003. This project focuses on helping programmers build correct software, and making systems with better reliability in spite of broken software.

Complicating the software problem is that concurrent programming is more important than ever, as most commercial processors have multiple cores. To take advantage of all that these processors have to offer, programmers need to be writing multithreaded software. Unfortunately, writing correct, reliable concurrent programs is extremely difficult. We believe that the way forward is to design architectures and build systems that not only help programmers find the errors in their concurrent programs, but also automatically avoid failures due to concurrent errors.

We are developing systems that can detect and survive a broad range of concurrency bugs. Concurrency bugs manifest nondeterministically, i.e., they depend upon the occurrence of certain bad sequence of events executed by different threads. There are many different ways the operations from different program threads can interleave during an execution and any of them respects the semantics of the source program. Our concurrency error debugging work aims to expose multithreaded executions that lead to failures to programmers. Our tools and techniques help programmers understand the failure so they can fix their code, and prevent it in the future. Our failure avoidance work aims to make the system automaticallyavoid multithreaded executions that lead to a failure. Doing so is possible by designing systems that can use analysis to find possible failures and can take avoidance actions to prevent those failures.

More information on our concurrency error detection research can be found here.

More information on our research about avoiding failures in concurrent programs can be found here.