I'm a
I am interested in the interface between computer science and biology. While I'm in the computer science department, my goal is to further my undergrad training in biology as much as possible, for computational biology without the biology is little more than a theoretical exercise. That said, my core interests really lie in the computational aspects, hence my decision to pursue a CS PhD.
HIV's tremendous capacity for rapid adaptation is the root of drug resistance, vaccine failure, and immune failure (AIDS). Our work is on computational models for identifying sources of selection pressure and the specific adapations that arise as a result of that selection pressure. For example, when a patient is given an anti-viral drug, a new selection pressure is introduced to the virus. Eventually, resistance mutations are selected for that protect the virus from the drug.
Our primary focus is on how HIV adapts to the cellular immune response. We are working with a number of collaborators on this project, with the goal of informing new approaches to vaccine design. Our models allow us to identify the host genetic variations (HLA alleles) that correlate with specific HIV mutations. To do this in a statistically sound way requires that we account for the evolutionary history of the the virus, linkage disequilibrium among HLAs, and compensatory mutations that lead to a dense network of dependencies among the HIV codons.
This work is based at Microsoft Research under the direction of my advisor David Heckerman. The source code is available via an open source license (note that the current version of the open sourced code supports only pairwise correlations). The press announcement was received well and lead to some nice articles on David and MSR in Business Week and on NPR.
From the fall of 2003 to the spring of 2007, I worked with Prof. Bob Gross and Dr. Arijit Chakravarty on a motif finding program. Our first papers focused on identifying key bounds that yielded efficient computation of DNA motifs of arbitrary length and degeneracy under any objective function. BEAM, PRISM and SPACER (see pubs) were each focused on the identificiation of non-degenerate, degenerated, and biparatite motifs, respectively. We found that by using beam-search algorithms we were able to heuristically limit the search space in such a way that expressive consensus motifs can be learned. Although the consensus motif representation is not as expressive as the position weight matrix, we found that a good search over the full space of consensus motifs yields better results than heuristic searches over PWM space that are quite prone to local optima.
Most recently, we combined these three focused motif finders into an ensemble method. SCOPE runs BEAM, PRISM and SPACER, then uses a unified scoring metric to combine the results. Really all it's doing is running the three focused motif finders, each of which uses the same scoring metric, which accounts for the size of the search space. We found that searching for three classes of motifs with motif finders that are good at those classes results in an ensemble that significantly outperforms those methods, as well as a number of other approaches.
Over the summer of 2005 I worked at Virtify. I built a computer vision library for feature extraction of high throughput immunoflourescent images in the context of lead compound discovery.
My undergrad thesis was on genome-wide distributions of consensus motifs. I built a relational database of all 9mers in the upstream regions of A. thaliana (a plant) and used statistical measures of the motifs distributions with respect to the translation start sites to find transcription factor binding sites. The project was promising but never carried through to publication. We presented our results at TIGR's annual conference.
We built from the ground up a network that uses sound as its medium. By careful use of encoding schemes and reliable transport, we were able to achieve reliable communication at distances of up to 15 feet in noisy environments at 160 bits per second. We were able to maintain 50% medium usage under high load using our novel congestion control mechanism. We also learned a great deal about C# and DirectX sound.

email:
office phone:
office:
aim: pvmania
gtalk: Jonathan.Carlson
mail:
    Redmond, WA 98052-6399