Title: New Algorithmic Tools for Distributed Similarity Search
Advisor: Paul Beame
Supervisory Committee: Paul Beame (Chair), Jevin West (GSR, iSchool), Luis Ceze, Mark Oskin, and Magda Balazinska
Abstract: The task of finding similar vectors in a large dataset arises as a subroutine in many applcations, such as image seach and collaborative filtering. I will describe some recently completed theoretical work that provides new upper and lower bounds on the communication required to report all close pairs in Hamming distance. I will also propose two future projects, centering around the implementation of distributed algorithms for similarity search based on Hamming distance and Edit distance. I will mention connections to edge-isoperimetry, distance correlations, DNA clustering, and metric embeddings.