TitleDiscriminative motif analysis of high throughput datasets
Publication TypeJournal Article
Year of Publication2014
AuthorsYao Z, Macquarrie KL, Fong AP, Tapscott SJ, Ruzzo WL, Gentleman RC
JournalBioinformatics
Volume30
Issue6
Pagination775-83
Date or Month PublishedMar 15
ISSN1367-4811
Abstract

MOTIVATION: High throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor. It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance.

RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared to DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results, and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar transcription factors. Lastly, we demonstrate discovery of key transcription factor motifs involved in tissue specification by examination of high throughput DNase accessibility data.

AVAILABILITY: The motifRG package is publically available via the Bioconductor repository.

CONTACT: yzizhen@fhcrc.org.

DOI10.1093/bioinformatics/btt615
Downloadshttp://www.ncbi.nlm.nih.gov/pubmed/24162561?dopt=Abstract
Month of Publication

Epub 2013 Oct 25

Alternate JournalBioinformatics
Citation Key9637
PubMed ID24162561