Title: CloudControl: Leveraging many public ChIP-seq controlexperiments to better remove background noise.

Advisors: Su-In Lee and Larry Ruzzo

Abstract: Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) is a widely used method to determine the binding positions of various proteins on the genome in a population of cells. A typical ChIP-seq protocol involves two experiments: one designed to capture target ChIP-seq signals (`target' experiment) and the other to capture background noise signals (`control' experiment). A peak calling algorithm then examines the difference between the target experiment data and control data to determine where the protein of interest binds along the genome. Our approach, named CloudControl, aims to improve the accuracy of peak calling by combining multiple control experiments from a publicly available source such as ENCODE to better remove background noise signals. To combine existing control experiment data we perform regression against a target experiment treating binned genome positions as samples (up to 32 million) and different control experiments as features (up to 450 through the ENCODE project). The regression fit is then used to generate a new control ChIP-seq data, which we refer to as CloudControl data. We use the following three metrics to evaluate the CloudControl data: (i) the presence of the known motifs for the corresponding target protein near called peaks, (ii) reproducibility among pairs of biologically replicated ChIP-seq experiments, and (iii) protein-protein physical interactions inferred from called peaks. In all three metrics, CloudControl data show superior performance over standard control tracks. This suggests that CloudControl can improve ordinary control tracks in the standard ChIP-seq protocol.

Place: 
CSE 303
When: 
Tuesday, August 9, 2016 - 16:00 to Tuesday, April 16, 2024 - 07:04