CSE Comp Bio group logo University of Washington Department of Computer Science & Engineering
 CSE 590 CB, Autumn 2001
  CSE Home  About Us    Search    Contact Info 

 Course Info   

Reading and Research in Computational Biology
Fridays, 10:30 - 11:50, EE 045

CSE 590 CB is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in the computer, biological, and mathematical sciences.
Organizers:  Larry Ruzzo, Rimli Sengupta, Martin Tompa
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Email Log (All mail sent to cse590cb@cs. Last: 01/03/02, 06PM.)
   Add/remove/change requests to cse590cb-request@cs.
Related Email Lists (majordomo administered; either send a message to majordomo@cs.washington.edu whose body (not subject) is the line subscribe groupname, where groupname is either compbio-group or compbio-seminars, or click the appropriate link below, fill in your correct email address, and send),
   compbio-group@cs: discussions about computational biology. Subscribe.
   compbio-seminars@cs: biology seminar announcements from around campus. Subscribe.
 Schedule
Date Topic Presenters/Participants Papers
10/05 Organizational Meeting    
10/12 BLAST Tompa, et al. Papers
10/19 Random Projection Tompa  
10/26 Experimental Design and Statistical Inference for cDNA Microarrays Kathleen Kerr, Biostatistics Abstract
11/02-11/09 Inference from Microarray Knockout Data Rimli, Larry, et al. Papers
11/16 More on random projection Tompa  
11/22 Holiday    
11/29 Semantically Specialized Databases for Bioinformatics Nat Goodman  
12/07 Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction Don Paterson, Ken Yasuhara, CSE Abstract

 Papers, etc.

Note on Electronic Access to Journals

Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, look at the library "proxy server" instructions.

Topic I: Improved Similarity Search

We will look at algorithms such as BLAST that find sequence similarity by extending short exact word matches. We will consider issues such as the tradeoff between sensitivity and efficiency in the choice of that word match.

References:

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410. (PDF)

  2. The home page for NCBI BLAST

  3. SSAHA: A Fast Search Method for Large DNA Databases http://www.genome.org/cgi/content/full/11/10/1725 (PDF)

  4. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," http://nar.oupjournals.org/cgi/content/full/25/17/3389 (PDF)

  5. Efficient large-scale sequence comparison by locality-sensitive hashing http://bioinformatics.oupjournals.org/cgi/reprint/17/5/419.pdf (PDF)

Topic II: Inference from Microarray Knockout Data

Much stronger inference about the functional role of particular genes should presumably be possible given expression data from knockout experiments than would otherwise be possible. However, the task is still daunting, both theoretically and practically. We're going to look at some of the literature on this problem from both perspectives, and try to identify promising approaches.

It would be great if any of you wanted to experiment on some real data. Some of the data sets that have been commonly used in these studies are the following. Please let us know if you (a) find other data, and/or (b) learn anything interesting from looking at it.

Data Sets:

The "dataset" links above are served by ExpressDB at Church's lab which is a great resource for publicly available expression datasets. The "zip" links above are local copies of ZIP archives (from Rosetta)that contain useful supplementary information.

References:

  1. Christopher J. Roberts, Bryce Nelson, Matthew J. Marton, Roland Stoughton, Michael R. Meyer, Holly A. Bennett, Yudong D. He, Hongyue Dai, Wynn L. Walker, Timothy R. Hughes, Mike Tyers, Charles Boone, Stephen H. Friend, "Signaling and Circuitry of Multiple MAPK Pathways Revealed by a Matrix of Global Gene Expression Profiles," Science, vol 287, 4 February 2000, 873-880. Paper. (PDF)

  2. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH. "Functional discovery via a compendium of expression profiles," Cell 2000 Jul 7;102(1):109-26. Paper. (PDF)

  3. Pe'er D, Regev A, Elidan G, Friedman N., "Inferring subnetworks from perturbed expression profiles," Bioinformatics 2001 Jun;17 Suppl 1:S215-S224" Paper. (PDF)

Guest Lectures

10/26:  Experimental Design and Statistical Inference for cDNA Microarrays

Kathleen Kerr, UW Biostatistics

Abstract: Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for large-scale analysis of gene expression. Using this technology, geneticists can simultaneously study the relative expression levels of thousands of genes in two or more cell populations. As the potential of this technology has become apparent, many important and interesting statistical questions persist. The two-dye system is integral to the technology, and it is common to summarize the two fluorescent readings from a spot with their ratio. This reduction loses some of the useful information in the data. Furthermore, one must account for multiple sources of variation in microarray data. This is commonly presented as the problem of data "normalization." I will discuss analysis of variance (ANOVA) techniques that integrate normalization into the data analysis so that it is done systematically and the degrees of freedom are explicitly acknowledged. Rather than relying on ratios, ANOVA models use the full amount of information in the data to get the best estimate of relative gene expression. This analytical framework allows one to consider alternative designs for microarray experiments that make more efficient use of scarce resources and produce more precise estimation of the quantities of interest. The underlying theme of this research is to incorporate rigorous statistical inference into microarray studies.

References:


12/07  Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction

Donald J. Patterson, Ken Yasuhara, UW CSE

Abstract: Accurate splice site prediction is a critical component of any computational approach to gene prediction in higher organisms. Existing approaches generally use sequence-based models that capture local dependencies among nucleotides in a small window around the splice site. We present evidence that computationally predicted secondary structure of moderate length pre-mRNA subsequences contains information that can be exploited to improve acceptor splice site prediction beyond that possible with conventional sequence-based approaches. Both decision tree and support vector machine classifiers, using folding energy and structure metrics characterizing helix formation near the splice site, achieve a 5--10% reduction in error rate with a human data set. Based on our data, we hypothesize that acceptors preferentially exhibit short helices at the splice site.

References:



 Other  Seminars Applied Math Department Mathematical Biology Journal Club
Biochemistry Department Seminars
COMBI Seminars (MBT 599C)
Genetics Department Seminars
Microbiology Department Seminars
Molecular Biotechnology Department Seminars
Zoology 525, Mathematical Biology Seminar Series

 Resources MBT 599 (aka MBT/GENET 540/541) Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis (Winter/Spring 2001)
CSE 590 CB, Winter, 2001.
CSE 590 CB, Autumn, 2000.
CSE 590 CB, Spring, 2000.
CSE 590 CB, Winter, 2000.
CSE 590 CB, Autumn, 1999.
CSE 590 CB, Spring, 1999.
CSE 590 CB, Winter, 1999.
CSE 590 CB, Autumn, 1998.
Lecture notes from CSE 527 (Computational Biology) (formerly known as CSE 590 BI).


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse590cb-webmaster@cs.washington.edu]