Title | Sequence-based heuristics for faster annotation of non-coding RNA families. |
Publication Type | Journal Article |
Year of Publication | 2006 |
Authors | Weinberg Z, Ruzzo WL |
Journal | Bioinformatics (Oxford, England) |
Volume | 22 |
Issue | 1 |
Pagination | 35-9 |
Date or Month Published | 2006 Jan 1 |
ISSN | 1367-4803 |
Keywords | Algorithms, Computational Biology, Genome, Humans, Markov Chains, Models, Statistical, Nucleic Acid Conformation, Proteins, Protein Structure, Secondary, RNA, RNA, Transfer, RNA, Untranslated, ROC Curve, Sensitivity and Specificity, Sequence Alignment, Software |
Abstract | MOTIVATION: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be.
RESULTS: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families.
AVAILABILITY: The source code is available under GNU Public License at the supplementary web site. |
DOI | 10.1093/bioinformatics/bti743 |
Downloads | http://www.ncbi.nlm.nih.gov/pubmed/16267089?dopt=Abstract |
Alternate Journal | Bioinformatics |
Citation Key | 1880 |
PubMed ID | 16267089 |