CSE logo University of Washington Department of Computer Science & Engineering
 CSE 590 CB, Winter 2001
  CSE Home  About Us    Search    Contact Info 

 Course Info   

Reading and Research in Computational Biology
Wednesdays, 3:30 - 5:00, MGH 074

CSE 590 CB is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in the computer, biological, and mathematical sciences.
Instructors:  Larry Ruzzo, Rimli Sengupta
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to an instructor if you are unsure of our expectations.
 Email Email Log (All mail sent to cse590cb@cs. Last: 03/01/01, 09PM.)
   Add/remove/change requests to cse590cb-request@cs.
Related Email Lists (majordomo administered)
   compbio-group@cs: discussions about computational biology. Subscribe.
   compbio-seminars@cs: biology seminar announcements from around campus. Subscribe.
 Schedule
Date Topic Presenters/Participants Papers
1/03 Organizational Meeting  
1/10 Guest Speaker  "From gene expression data to cancer class discovery" Amir Ben-Dor, Agilent Laboratories Abstract
1/17 "Nonlinear PCA" Tammy, Gidon; Larry Papers
1/24 UTR Reconstruction Chris, Jochen; Larry Paper (ISMB 2000)
1/31 "Two-hybrid screens and group testing" Zasha  
2/07 Guest Speaker
"Plaid models for microarray data"
Art Owen, Stanford Abstract
2/14 Guest Speaker   "Comparison of prokaryotic genomes using colinear regions" Joao Carlos Setubal, Instituto de Computacao - UNICAMP Abstract
2/21 "Separating real motifs from their artifacts" Mathieu, Saurabh Abstract
2/28 Guest Speaker  "Protein structure prediction: progress and prospects" Ram Samudrala, UW Microbiology Abstract
3/07 Guest Speaker  "A Statistical Modeling Approach for Analyzing Microarray Data --- Questions, Issues and Challenges" Lue Ping Zhao, FHCRC, Biostatistics  Abstract

 Papers, etc.

1/10:  From gene expression data to cancer class discovery

Amir Ben-Dor, Agilent Laboratories

Recent studies demonstrate the discovery of putative disease subtypes from gene expression data. The underlying computational problem is to partition the set of sample tissues into statistically meaningful classes. We approach this problem by statistically scoring candidate partitions according to the overabundance of genes that separate the different classes. (Overabundance is measured against a stochastic null model). Using simulated annealing we explore the space of all possible partitions of the set of samples, seeking highly scoring partitions. We demonstrate the performance of our methods on both synthetic data, where we recover planted partitions, and on tumor expression datasets, where we find several highly pronounced partitions. Joint work with Nir Friedman and Zohar Yakhini. To appear, RECOMB 2001. A shorter talk presented at IPAM.

Preprints of some of this work are available from his co-author's web site. Two papers of interest are:

  • Recomb01 paper: "Class Discovery in Gene Expression Data". (This is mainly what Amir talked about.)
  • JCB submission: "Tissue Classification with Gene Expression Profiles". This is the paper that has the derivation of the closed form for the p-value of their TNoM score.

There's also a paper on gene scoring methods (technical report, pdf file) at http://www.labs.agilent.com/resources/techreports.html .


1/17:  "Nonlinear PCA"

Following up on last quarter's look at Principal Component Analysis for analysis of gene expression microarry data, here are two recent papers proposing related ideas for discovery of nonlinear structures in high-dimensional spaces.

References:


2/7:   Plaid models for microarray data

Art Owen, Statistics, Stanford University

Abstract: This talk describes the plaid model, a tool for exploratory analysis of multivariate data. The motivating application is the search for interpretable biological structure in gene expression microarray data. Interpretable structure can mean that a set of genes has a similar expression pattern, in the samples under study, or in just a subset of them (such as the cancerous samples).

A set of genes behaving similarly in a set of samples, defines what we call a ``layer''. These are very much like clusters, except that: genes can belong to more than one layer or to none of them, the layer may be defined with respect to only a subset of the samples, and the role of genes and samples is symmetric in our formulation.

The plaid model is a superposition of two way anova models, each defined over subsets of genes and samples. This talk will present the plaid model, an interior point style algorithm for fitting it, and some examples from yeast DNA arrays and other problems.

This is joint work with Laura Lazzeroni.

References: 1, 2.


2/14:   Comparison of prokaryotic genomes using colinear regions

Joao Carlos Setubal, Instituto de Computacao - UNICAMP

Abstract: Now that there are dozens of prokaryotic genomes publicly available (with soon to become hundreds or even thousands) there is a major drive towards gaining full advantage of the information provided by such complete genomes. One way to compare genomes is to look for runs of corresponding consecutive genes (colinear regions). Most of these runs, when properly clustered, contain genes that are functionally related, so the cluster analysis can help the understanding of metabolic pathways as well as provide clues to functions of hypothetical genes. In this talk I will review previous work that has been done along these lines and describe data on such runs that I have collected from comparisons of five bacterial genomes.


2/21:   Separating real motifs from their artifacts

Mathieu Blanchette & Saurabh Sinha, CSE, University of Washington

Abstract: The typical output of many computational methods to identify binding sites is a long list of motifs containing some real motifs (those most likely to correspond to the actual binding sites) along with a large number of random variations of these. We present a statistical method to separate real motifs from their artifacts. This produces a short list of high quality motifs that is sufficient to exp lain the over-representation of all motifs in the given sequences. Using synthetic data sets, we show that the output of our method is very accurate. On various sets of upstream sequences in {m S. cerevisiae}, our program identifies several known binding sites, as well as a number of significant novel motifs.


2/28:   Protein structure prediction: progress and prospects

Ram Samudrala, Microbiology, University of Washington

Abstract: The Critical Assessment of protein Structure Prediction (CASP) methods conference was instigated to ensure that protein structure prediction approaches are tested rigorously without advance knowledge of the experimental answer. We have made predictions at all four CASP meetings, each time improving upon previously developed methodologies. In the recent CASP4 experiment, we made predictions in all three prediction categories: comparative modelling, fold recognition, and ab initio prediction. The talk will focus on the performance of our prediction methodologies in the context of ongoing structural genomics efforts.


2/28:   A Statistical Modeling Approach for Analyzing Microarray Data --- Questions, Issues and Challenges

Lue Ping Zhao, FHCRC, Biostatistics

Abstract: Functional genomic studies are now routinely conducted by biomedical researchers, generating a huge amount of expression data. Preliminary exploration of such data have yielded much useful information, and have also indicated many challenges facing continuing successes of functional genomics. One objective of this talk is to identify some of the challenging issues. Another objective is to identify additional research questions one can address in the analysis. Lastly, I will be describing a statistical modeling approach that we have developed for analyzing microarray data. To illustrate the methodology, we apply it to the analysis of Leukemia data set, some results from which will be highlighted in this talk.



 Other  Seminars Applied Math Department Mathematical Biology Journal Club
Biochemistry Department Seminars
COMBI Seminars (MBT 599C)
Genetics Department Seminars
Microbiology Department Seminars
Molecular Biotechnology Department Seminars
Zoology 525, Mathematical Biology Seminar Series

 Resources MBT 599 Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis (Winter Quarter 2001)
CSE 590 CB, Autumn, 2000.
CSE 590 CB, Spring, 2000.
CSE 590 CB, Winter, 2000.
CSE 590 CB, Autumn, 1999.
CSE 590 CB, Spring, 1999.
CSE 590 CB, Winter, 1999.
CSE 590 CB, Autumn, 1998.
Lecture notes from CSE 527 (Computational Biology) (formerly known as CSE 590 BI).


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
cse590cb-webmaster@cs.washington.edu
[comments to cse590CB-webmaster]