|
CSE Home |
About Us |
Search |
Contact Info |
| Course Info |
Reading and Research in Computational Biology
CSE 590 CB is a weekly seminar on Readings and Research in
Computational Biology, open to all graduate students in the computer,
biological, and mathematical sciences.
| |||||||||||||||||||||||||||||||||||||||||||||
| Organizers: | Larry Ruzzo, Rimli Sengupta, Martin Tompa |
| Credit: | 1-3 Variable |
| Grading: | Credit/No Credit. Talk to the organizers if you are unsure of our expectations. |
| Date | Topic | Presenters/Participants | Papers |
|---|---|---|---|
| 10/05 | Organizational Meeting | ||
| 10/12 | BLAST | Tompa, et al. | Papers |
| 10/19 | Random Projection | Tompa | |
| 10/26 | Experimental Design and Statistical Inference for cDNA Microarrays | Kathleen Kerr, Biostatistics | Abstract |
| 11/02-11/09 | Inference from Microarray Knockout Data | Rimli, Larry, et al. | Papers |
| 11/16 | More on random projection | Tompa | |
| 11/22 | Holiday | ||
| 11/29 | Semantically Specialized Databases for Bioinformatics | Nat Goodman | |
| 12/07 | Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction | Don Paterson, Ken Yasuhara, CSE | Abstract |
Note on Electronic Access to Journals
We will look at algorithms such as BLAST that find sequence similarity by extending short exact word matches. We will consider issues such as the tradeoff between sensitivity and efficiency in the choice of that word match.
Much stronger inference about the functional role of particular genes should presumably be possible given expression data from knockout experiments than would otherwise be possible. However, the task is still daunting, both theoretically and practically. We're going to look at some of the literature on this problem from both perspectives, and try to identify promising approaches.
It would be great if any of you wanted to experiment on some real data. Some of the data sets that have been commonly used in these studies are the following. Please let us know if you (a) find other data, and/or (b) learn anything interesting from looking at it.
Abstract: Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for large-scale analysis of gene expression. Using this technology, geneticists can simultaneously study the relative expression levels of thousands of genes in two or more cell populations. As the potential of this technology has become apparent, many important and interesting statistical questions persist. The two-dye system is integral to the technology, and it is common to summarize the two fluorescent readings from a spot with their ratio. This reduction loses some of the useful information in the data. Furthermore, one must account for multiple sources of variation in microarray data. This is commonly presented as the problem of data "normalization." I will discuss analysis of variance (ANOVA) techniques that integrate normalization into the data analysis so that it is done systematically and the degrees of freedom are explicitly acknowledged. Rather than relying on ratios, ANOVA models use the full amount of information in the data to get the best estimate of relative gene expression. This analytical framework allows one to consider alternative designs for microarray experiments that make more efficient use of scarce resources and produce more precise estimation of the quantities of interest. The underlying theme of this research is to incorporate rigorous statistical inference into microarray studies.
Abstract: Accurate splice site prediction is a critical component of any computational approach to gene prediction in higher organisms. Existing approaches generally use sequence-based models that capture local dependencies among nucleotides in a small window around the splice site. We present evidence that computationally predicted secondary structure of moderate length pre-mRNA subsequences contains information that can be exploited to improve acceptor splice site prediction beyond that possible with conventional sequence-based approaches. Both decision tree and support vector machine classifiers, using folding energy and structure metrics characterizing helix formation near the splice site, achieve a 5--10% reduction in error rate with a human data set. Based on our data, we hypothesize that acceptors preferentially exhibit short helices at the splice site.
|
Department of Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to cse590cb-webmaster@cs.washington.edu] | |