TitleIntegration of 198 ChIP-seq Datasets Reveals Human cis-Regulatory Regions.
Publication TypeJournal Article
Year of Publication2012
AuthorsBolouri H, Ruzzo WL
JournalJournal of computational biology : a journal of computational molecular cell biology
Start Page1
Date or Month PublishedSept

Abstract We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ∼0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ∼8% of the genome (Ernst et al., Nature 2011 , 473, 43-49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011 , 21, 456-464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. ( 2011 ) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. ( 2011 ). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer ( http://www.sanger.ac.uk/genetics/CGP/cosmic/ ). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog ( www.broadinstitute.org/genome_bio/human_lincrnas/ ). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available.



Alternate JournalJ. Comput. Biol.
Citation Key8392
PubMed ID22897152