GS559: Introduction to Statistical and Computational Genomics - Winter 2010

Instructors:
   Jim Thomas, jhtuw.edu (543-7877);
   Larry Ruzzo, ruzzouw.edu (543-6298)

Schedule: Hitchcock 220, TuTh 3:30 - 4:50, Jan. 5 -- Mar. 11.

Final Exam: Hitchcock 220, Th 3/18, 4:30--6:20.

Links:

Assignments

You are welcome to talk to classmates about principles for solving problems, but do NOT solve specific problems together. In many ways, the problem solving is where you will learn the most for this class, especially the programming.

Problem Set 1 (due Tues. Jan. 12).
Problem Set 2 (due Tues. Jan. 19).
Problem Set 3 (due Tues. Jan. 26).
Problem Set 4 (due Tues. Feb.   2).
Problem Set 5 (due Tues. Feb.   9).
Problem Set 6 (due Thurs. Feb 25).
Problem Set 7 (due Tues. Mar.   9).
Problem Set 8 (due Tues. Mar. 16).

All problem sets are due by the start of class on the date listed.

TIP - Google your programming problem. For example, "python string search" will get you relevant information on how to search a string pretty easily. Try it for those cryptic error messages, too.

Test/Demo Files

The following files are used in some of the in-class exercises and demos.

Sonnet.txt
Large.txt
Small.txt
Scores.txt
Seq names.txt
Testre.txt
Ex0.py
Ex1.py
Ex2.py
Blast-demo.py



Lectures and Reading

#DateLecture TopicProgramming TopicNotes & Reading
101/05Overview of course. Introduction to sequence comparison. BLAST, alignment scoringSlidesIntroduction to Python. Interpreter, objects, types, variables, command lineSlides [1, 2]
201/07Sequence alignment - dynamic programmingSlidesStringsSlides 
301/12Sequence alignment - local alignmentSlidesNumbers, lists, tuplesSlidesWikipedia: Smith-Waterman
401/14Sequence alignment - protein score matricesSlidesFile input-ouput, if-then-elseSlides[3]
501/19Sequence alignment - signficance of similarity scoresSlidesFor loopsSlidesAltschul BLAST statistics tutorial
601/21No lecture While loops and review of programmingSlides 
701/26Whole genome alignments, Sequence trees - introductionSlidesMore on loopsSlides 
801/28Sequence trees - parsimony and distanceSlidesDictionaries (hash maps)Slides 
902/02Sequence trees - distance and maximum likelihoodSlidesFunctions, program organizationSlides Conceptual overview: slide 9; also see Sample Problem #2
1002/04Sequence trees - branch significance, bootstrapSlidesSortingSlides 
1102/09MotifsSlidesRegular expressionsSlides [4] Regexp: Tutorial; HowTo; Library Ref
1202/11MotifsSlidesRegular expressionsSlides 
1302/16MotifsSlidesRegular expressionsSlides 
1402/18BLASTSlidesObjectsSlides[5, 6] Reading: Ch 15-18
1502/23Multiple AlignmentSlidesObjectsSlides[7] Wikipedia: Multiple sequence alignment
1602/25Gene PredictionSlidesObjectsSlides[8]
1703/02Hidden Markov ModelsSlidesBiopythonSlides[9] Biopython, esp Tutorial&Cookbook
1803/04Hidden Markov ModelsSlidesBiopythonSlides 
1903/09Probabilities on pedigreesSlidesExceptionsSlidesWikipedia: Genetic linkage plus the section of Strachan & Read cited therein.
2003/11RNASlidesBiopythonSlidesRead either [10] or [11]

References

padlock  Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.

  1. Noble, WS, "A quick guide to organizing computational biology projects." PLoS Comput. Biol. 5 (2009) e1000424. Pmid: 19649301 [Offcampus]
  2. Dudley, JT and Butte, AJ, "A quick guide for developing effective bioinformatics programming skills." PLoS Comput. Biol. 5 (2009) e1000589. Pmid: 20041221 [Offcampus]
  3. Eddy, SR, "Where did the BLOSUM62 alignment score matrix come from?" Nat. Biotechnol. 22 (2004) 1035-6. Pmid: 15286655 [Offcampus]
  4. Stormo, GD, "DNA binding sites: representation and discovery." Bioinformatics 16 (2000) 16-23. Pmid: 10812473 [Offcampus]
  5. Nicholas, HB, Deerfield, DW and Ropelewski, AJ, "Strategies for searching sequence databases." BioTechniques 28 (2000) 1174-8, 1180, 1182 passim. Pmid: 10868283 [Offcampus]
  6. Pertsemlidis, A and Fondon, JW, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol. 2 (2001) REVIEWS2002. Pmid: 11597340 [Offcampus]
  7. Notredame, C, "Recent evolutions of multiple sequence alignment algorithms." PLoS Comput. Biol. 3 (2007) e123. Pmid: 17784778 [Offcampus]
  8. Harrow, J, Nagy, A, Reymond, A, Alioto, T, Patthy, L, Antonarakis, SE and Guigó, R, "Identifying protein-coding genes in genomic sequences." Genome Biol. 10 (2009) 201. Pmid: 19226436 [Offcampus]
  9. Eddy, SR, "What is a hidden Markov model?" Nat. Biotechnol. 22 (2004) 1315-6. Pmid: 15470472 [Offcampus]
  10. Amaral, PP, Dinger, ME, Mercer, TR and Mattick, JS, "The eukaryotic genome as an RNA machine." Science 319 (2008) 1787-9. Pmid: 18369136 [Offcampus]
  11. Breaker, RR, "Complex riboswitches." Science 319 (2008) 1795-7. Pmid: 18369140 [Offcampus]

Python Resources

   General
Regular Expressions
"RegExPal" (For Javascript rather than Python, but similar and quite handy. Try it!)
Biopython
Python Books
Python for Software Design: How to Think Like a Computer Scientist by Allen B. Downey. (Includes early drafts of our text book; cheaper than the published version, but less polished...)
Learning Python by Mark Lutz. O'Reilly (Very comprehensive. Much is accessible to beginners.)
Dive Into Python 3 by Mark Pilgrim. (Another online book. Based on Python 3, so some differences, and more advanced, but also free.)

Bioinformatics Resources

Books
Biological sequence analysis: probabilistic models of proteins and nucleic acids, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge. (Excellent reference for probabilistic models including HMMs, SCFG's, alignment, phylogeny.)
Inferring Phylogenies, Joseph Felsenstein, Sinauer, 2004. (Excellent reference on this topic.)
Introduction to Computational Genomics: A Case Studies Approach, Cristianini, Nello & Hahn, Matthew, Cambridge, 2007.
Bioinformatics: Sequence and Genome Analysis, David W. Mount, Cold Spring Harbor Laboratory Press.
Python for Bioinformatics, Sebastian Bassi, CRC Press, 2010. (A little too advanced as a progamming book for beginners, but fine now that you're experienced.)
Python for Bioinformatics, Jason Kinser, Jones and Bartlett, 2009. (Ditto.)
Online
Python course in Bioinformatics Katja Schuerer, Catherine Letondal, 2008, Pasteur Institute. (Another nice online course.)


James H. Thomas
Department of Genome Sciences
University of Washington
jhtuw.edu

Walter L. Ruzzo
Departments of Computer Science and Engineering and Genome Sciences
University of Washington
ruzzouw.edu