CSE Comp Bio group logo University of Washington Department of Computer Science & Engineering
 CSE 590 CB, Autumn 2002
  CSE Home  About Us    Search    Contact Info 

 Course Info   

Reading and Research in Computational Biology
Mondays, 3:30 - 4:50, EE1 026

CSE 590 CB is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in the computer, biological, and mathematical sciences.
Organizers:  Bill Noble, Martin Tompa
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Email Log (All mail sent to cse590cb@cs. Last: [an error occurred while processing this directive].)
   Add/remove/change requests to cse590cb-request@cs .

Related Email Lists:
   compbio-seminars@cs.washington.edu: biology seminar announcements from all around campus.
   compbio-group@cs.washington.edu: discussions about computational biology.
To subscribe, send a message to
  majordomo@cs.washington.edu
whose body (not subject) is the line
  subscribe groupname
where groupname is either compbio-group or compbio-seminars

 Schedule
Date Presenters/Participants Topic Papers
09/30 Bill Noble, Martin Tompa Organization; description of motif research problem Papers & Tools
10/07 Amol Prakash, Martin Tompa Binding sites in coding regions Paper
10/14 Phil Green Signal and noise in genome sequences  
10/21 Dan Gusfield, UC Davis Combinatorial approaches to haplotyping Paper & Abstract
10/28 Brian Tjaden, Martin Tompa Segment match refinement Paper
11/04 Debraj GuhaThakurta, Rosetta Inpharmatics Identification of Regulatory Elements Abstract
11/11   Holiday  
11/18 Emily Rocke, Gidon Shavit, Zizhen Yao, Bill Noble Transcriptional modules Paper
11/25 Emily Rocke, Gidon Shavit, Zizhen Yao, Bill Noble Transcriptional modules Paper
12/02 Dan Grossman, Zasha Weinberg, Bill Noble Genome-scale data types Papers
12/09 All Results on motif research problem  

 Papers, etc.

  Note on Electronic Access to Journals

Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, look at the library "proxy server" instructions.  


9/30:

Some background on motifs and regulatory elements:
  1. Lectures 7-10 of CSE 527, Winter 2000,Computational Biology
  2. Ohler and Niemann, Identification and analysis of eukaryotic promoters: recent computational approaches , Trends in Genetics, Volume 17, Issue 2, 1 February 2001, Pages 56-60
  3. Fickett and Wasserman, Discovery and modeling of transcriptional regulatory regions , Current Opinion in Biotechnology, Volume 11, Issue 1, 1 February 2000, Pages 19-24

Some available tools for discovery of motifs:

  1. MEME
  2. Gibbs sampler
  3. RSA tools
  4. YMF
  5. FootPrinter


10/07:

Blanchette,
A comparative analysis method for detecting binding sites in coding regions


10/21:

Bafna, Gusfield, Lancia, and Yooseph,
Haplotyping as perfect phylogeny: a direct approach

Abstract:

The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. However, most studies will collect genotype data rather than haplotype data, requiring the deduction of haplotype information from genotype data. Input to the haplotyping problem is a set of N genotypes, and output is N haplotype pairs that "explain" the genotypes. Several computational approaches to this problem have been developed and used. Most of these use a statistical framework, such as MLE, or statistical computing methods such as EM, Gibbs sampling etc.

We have developed several distinct combinatorial approaches to the haplotyping problem. I will talk about a few of the most recent of these approaches. One approach, the "pure parsimony" approach is to find N pairs of haplotypes, one for each genotype, that explain the N genotypes and MINIMIZE the number of distinct haplotypes used. Solving this problem is NP-hard, however, for reasonable size data (larger than in general use today), the "pure-parsimony" solution can be efficiently found in practice. I will also talk about an approach that mixes pure-parsimony with Clark's subtraction method for haplotyping. Simulations show that the efficiently of both methods depends positively on the level of recombination - the more recombination, the more efficiency, but the accuracy depends inversely on the level of recombination. I will also discuss a simple way to boost the accuracy of Clark's method.

The main emphasis will be on my most recent approach, based on viewing the haplotyping problem in the context of perfect phylogeny. The biological motivation for this is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected. This, along with assumption of infinite sites implies that any "permitted" solution to the haplotyping problem should fit a perfect phylogeny, This is a severe combinatorial constraint on the permitted solutions to the haplotyping problem, and it leads to efficient deterministic algorithms (I will talk about two of them) to deduce all features of the permitted haplotype solution(s) that can be known with certainty. We obtain a) an efficient algorithm to find (if one exists) one permitted solution to the haplotype problem; b) a simple test to determine if it is the unique permitted solution; c) an efficient way to count the number of permitted solutions; d) and an efficient way to implicitly represent the set of all permitted solutions so that each can be efficiently created. As a by-product, we prove that the number of permitted solutions is bounded by 2^k, where k is less than or equal to the number of sites. This is a huge reduction from the number of solutions if we do not require that a solution fit a perfect phylogeny. This implicit representation also allows simple solutions to secondary questions about the genotypes. For example, in the case that the tree is not unique, the representation explicitly reveals which phase relationships (if determined) would reduce the number of possible trees, and gives a conceptual solution to the problem of minimizing the number of haplotype pairs needed to be laboratory-determined in order to deduce the correct tree. We also observe a strong phase-transition (no pun intended) in the probability that a given number of genotypes will be sufficient to uniquely determine the tree.

Parts of this work are joint with different collaborators, including R.H. Chung, V. Bafna, G. Lancia, S. Orzack and S. Yooseph


10/28:

Halpern, Huson, and Reinert,
Segment match refinement and applications


11/04:

Title:
"Identification of Transcription Regulatory Elements in Eukaryotic Genomes"

Abstract:
Transcriptional activation in eukaryotic organisms normally requires combinatorial interactions of multiple transcription factors. Though several methods exist for identification of individual protein binding site patterns in DNA sequences, there are few methods for the discovery of binding site patterns for cooperatively acting factors. A new method, Co-Bind (for COoperative BINDing) has been developed for this purpose (1). This method and some results will be discussed.

From a set of muscle genes in C. elegans we identified several putative muscle-specific regulatory elements using DNA pattern recognition methods. Experimental verification of these regulatory motifs is being pursued (2). Through computational and experimental methods we have recently identified a new transcription regulatory element involved in C. elegans in heat-shock response that seem to act in cooperation with the previously well established HSF regulatory factor (3). Some data from these projects will be presented.

References:

  1. GuhaThakurta, D., and Stormo, G.D. (2001) Identifying target sites for cooperatively binding factors. Bioinformatics, 17, 608-621.
  2. GuhaThakurta, D., Schriefer, L.A., Hresko, M., Waterston, R.H., and Stormo, G.D. (2001) Identifying muscle regulatory elements and genes in the nematode Caenorhabditis elegans. Proceedings of the 7th Pacific Symposium on Biocomputing (PSB 2002). 425-436.
  3. GuhaThakurta D, Palomar L, Stormo GD, Tedesco P, Johnson TE, Walker DW, Lithgow G, Kim S, Link CD. Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. Genome Res. (2002) 12, 701-712.


11/18:

Berman et al.,
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome

Markstein et al., Genome-wide analysis of clustered Dorsal binding site identifies putative target genes in Drosophila embryo


11/25:

Frith, Hansen and Weng,
Detection of cis-element clusters in higher eukaryotic DNA


12/02:

Glaever et al., Functional profiling of the S. cerevisiae genome
von Mering et al., Comparative assessment of large-scale data sets of protein-protein interactions



 Other  Seminars COMBI & Genome Sciences Seminars
Applied Math Department Mathematical Biology Journal Club
Biostatistics Seminars
Biochemistry Department Seminars
Microbiology Department Seminars
Zoology 525, Mathematical Biology Seminar Series

 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
S-Star Bioinformatics Online Course Schedule, a collection of video primers
CSE 527: Computational Biology
MBT/GENET 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse590cb-webmaster@cs.washington.edu]