image University of Washington Computer Science & Engineering
 GS 541, Sp '05: The RNA Guest Lectures, 5/26 &5/31
  CSE Home   About Us    Search    Contact Info 

Lecture Slides

Homework
  1. Download the Infernal software package (infernal.tar.gz, version 0.55) and its "User's Guide" from here.
  2. Read sections 1-3 of the user guide (possibly skipping "local alignments" on page 12).
  3. Build and install it following the instructions in section 2 of the manual.
  4. Follow the tutorial steps outlined in section 3 "Getting Started". Note that the files tutorial.sto/db/fa referred to there seem to have been removed or renamed in this release. Use trna.sto, trna_yeast_phe.fa for the first two of these; either use this for the cmalign input or make your own as described on the bottom of page 10.
  5. The cmbuild example builds a model ("my.cm") for tRNA based on 5 or so yeast tRNAs. With so few sequences and such closely related ones, it shouldn't surprise you that it's not a very good model. But it's not useless. I've extracted a handful of tRNA sequences from the Genbank records for Pyrococcus furiosus (an anaerobic archaeon found in 100°C sediments near sea floor vents, presumably not a close relative of S. cerevisiae). The sequences are here. Run cmsearch on this data.
  6. Email me (ruzzo at cs) the output produced by cmsearch in the step above.
  7. Optional, open-ended extension: Note that the scores produced by this search are not convincingly high, even given the small size of the "data base" you just searched. And it missed some of the tRNAs (they have negative scores). See if you can do better. I.e., add some additional tRNA sequences to the samples in trna.sto, refine the structure annotation as needed, build a new model via cmbuild, and rescan the P. furiosus sequences to see if you can find more of the tRNAs and/or attain higher scores. This would be most convincing as a strategy for RNA annotation if the new sequences you add to the model training set are not closely related to P. furiosus, but even using a few of those to find the rest would be interesting. cmalign might be useful for helping you add new sequences to the alignment, but it's quite possible that you can do better manually. If you have the time and patience, scan more of the genome to see how it does, in terms of false positives, for example. Perhaps use cmbuild again to align all your P. furiosus hits. Email me (a) the refined .sto file you created, (b) the results of using it (or, rather, using the covariance model built from it) to rescan my small P. furiosus data set, and (c) a paragraph or two sketching how you went about refining the alignment.
Please don't hesitate to contact me if you have questions, problems installing the software, etc.

References

Some reviews about non-coding RNAs in general: The fundamentals of covariance models: Some algorithmic details: Rfam overview: A good recent survey and comparison of RNA structure prediction: Our work on accelerating CM searches: Two of the biological examples I discussed. The interplay between computational and experimental approaches is probably clearer in the 6S papers, and the differences in the approaches/results are also interesting.

Larry Ruzzo


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX