Download the Infernal software package
(infernal.tar.gz, version 0.55) and its "User's
Guide" from here.
Read sections 1-3 of the user guide (possibly skipping "local alignments" on page 12).
Build and install it following the
instructions in section 2 of the manual.
Follow the tutorial steps outlined in section 3 "Getting
Started". Note that the files
tutorial.sto/db/fa referred to there seem to
have been removed or renamed in this release. Use
trna.sto, trna_yeast_phe.fa for the first two
of these; either use this
for the cmalign input or make your own as
described on the bottom of page 10.
The cmbuild example builds a model
("my.cm") for tRNA based on 5 or so yeast
tRNAs. With so few sequences and such closely related
ones, it shouldn't surprise you that it's not a very
good model. But it's not useless. I've extracted a
handful of tRNA sequences from the Genbank records for
Pyrococcus furiosus (an anaerobic archaeon
found in 100°C sediments near sea floor vents,
presumably not a close relative of S.
cerevisiae). The sequences are here. Run
cmsearch on this data.
Email me (ruzzo at cs) the output produced
by cmsearch in the step above.
Optional, open-ended extension: Note that the
scores produced by this search are not convincingly
high, even given the small size of the "data base" you
just searched. And it missed some of the tRNAs (they
have negative scores). See if you can do better. I.e.,
add some additional tRNA sequences to the samples in
trna.sto, refine the structure annotation as
needed, build a new model via cmbuild, and
rescan the P. furiosus sequences to see if you
can find more of the tRNAs and/or attain higher scores.
This would be most convincing as a strategy for RNA
annotation if the new sequences you add to the model
training set are not closely related to P.
furiosus, but even using a few of those to find the
rest would be interesting. cmalign might be
useful for helping you add new sequences to the
alignment, but it's quite possible that you can do
better manually. If you have the time and patience,
scan more of the genome to see how it does, in terms of
false positives, for example. Perhaps use cmbuild again
to align all your P. furiosus hits. Email me (a)
the refined .sto file you created, (b) the results of
using it (or, rather, using the covariance model built
from it) to rescan my small P. furiosus data
set, and (c) a paragraph or two sketching how you went
about refining the alignment.
Please don't hesitate to contact me if you have questions,
problems installing the software, etc.
Durbin, Richard and Eddy, Sean R. and Krogh, Anders and Mitchison, Graeme,
"Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids,
Cambridge,1998. Sections 9.5-9.7 and Chapter 10.
Some algorithmic details:
Eddy SR.
A memory-efficient dynamic programming algorithm for optimal
alignment of a sequence to an RNA secondary structure.
BMC Bioinformatics. 2002 Jul 2;3(1):18.
Rfam overview:
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy,
S.R., Bateman, A.:
Rfam: annotating non-coding RNAs in complete genomes.
Nucleic Acids Res 33 (2005) 121--124
A good recent survey and comparison of RNA structure prediction:
Two of the biological examples I discussed. The interplay
between computational and experimental approaches is
probably clearer in the 6S papers, and the differences
in the approaches/results are also interesting.
Mandal, Lee, Barrick, Weinberg, Emilsson, Ruzzo, and Breaker:
A Glycine-dependent Riboswitch that Uses
Cooperative Binding to Control Gene Expression in Bacteria.
Science, 2004 Oct 8;306(5694):275-9.
Correction.
Barrick, Sudarsan, Weinberg, Ruzzo and Breaker:
6S RNA is a widespread regulator of eubacterial RNA polymerase
that resembles an open promoter.
RNA. 2005 May;11(5):774-84. Epub 2005 Apr 5.
Willkomm DK, Minnerup J, Huttenhofer A, Hartmann RK.
Experimental RNomics in Aquifex aeolicus: identification of
small non-coding RNAs and the putative 6S RNA homolog.
Nucleic Acids Res. 2005 Apr 6;33(6):1949-60.