|
CSE Home |
About Us |
Search |
Contact Info |
| Course Info |
Reading and Research in Computational Biology
CSE 590 CB is a weekly seminar on Readings and Research in
Computational Biology, open to all graduate students in the computer,
biological, and mathematical sciences.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Organizers: | Bill Noble, Martin Tompa |
| Credit: | 1-3 Variable |
| Grading: | Credit/No Credit. Talk to the organizers if you are unsure of our expectations. |
Related Email Lists:
compbio-seminars@cs.washington.edu:
biology seminar announcements from all around campus.
compbio-group@cs.washington.edu:
discussions about computational biology.
To subscribe, send a message to
majordomo@cs.washington.edu
whose body (not subject) is the line
subscribe groupname
where groupname is either
compbio-group or compbio-seminars
| Date | Presenters/Participants | Topic | Papers |
|---|---|---|---|
| 09/30 | Bill Noble, Martin Tompa | Organization; description of motif research problem | Papers & Tools |
| 10/07 | Amol Prakash, Martin Tompa | Binding sites in coding regions | Paper |
| 10/14 | Phil Green | Signal and noise in genome sequences | |
| 10/21 | Dan Gusfield, UC Davis | Combinatorial approaches to haplotyping | Paper & Abstract |
| 10/28 | Brian Tjaden, Martin Tompa | Segment match refinement | Paper |
| 11/04 | Debraj GuhaThakurta, Rosetta Inpharmatics | Identification of Regulatory Elements | Abstract |
| 11/11 | Holiday | ||
| 11/18 | Emily Rocke, Gidon Shavit, Zizhen Yao, Bill Noble | Transcriptional modules | Paper |
| 11/25 | Emily Rocke, Gidon Shavit, Zizhen Yao, Bill Noble | Transcriptional modules | Paper |
| 12/02 | Dan Grossman, Zasha Weinberg, Bill Noble | Genome-scale data types | Papers |
| 12/09 | All | Results on motif research problem | |
Note on Electronic Access to Journals
Some available tools for discovery of motifs:
The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. However, most studies will collect genotype data rather than haplotype data, requiring the deduction of haplotype information from genotype data. Input to the haplotyping problem is a set of N genotypes, and output is N haplotype pairs that "explain" the genotypes. Several computational approaches to this problem have been developed and used. Most of these use a statistical framework, such as MLE, or statistical computing methods such as EM, Gibbs sampling etc.
We have developed several distinct combinatorial approaches to the haplotyping problem. I will talk about a few of the most recent of these approaches. One approach, the "pure parsimony" approach is to find N pairs of haplotypes, one for each genotype, that explain the N genotypes and MINIMIZE the number of distinct haplotypes used. Solving this problem is NP-hard, however, for reasonable size data (larger than in general use today), the "pure-parsimony" solution can be efficiently found in practice. I will also talk about an approach that mixes pure-parsimony with Clark's subtraction method for haplotyping. Simulations show that the efficiently of both methods depends positively on the level of recombination - the more recombination, the more efficiency, but the accuracy depends inversely on the level of recombination. I will also discuss a simple way to boost the accuracy of Clark's method.
The main emphasis will be on my most recent approach, based on viewing the haplotyping problem in the context of perfect phylogeny. The biological motivation for this is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected. This, along with assumption of infinite sites implies that any "permitted" solution to the haplotyping problem should fit a perfect phylogeny, This is a severe combinatorial constraint on the permitted solutions to the haplotyping problem, and it leads to efficient deterministic algorithms (I will talk about two of them) to deduce all features of the permitted haplotype solution(s) that can be known with certainty. We obtain a) an efficient algorithm to find (if one exists) one permitted solution to the haplotype problem; b) a simple test to determine if it is the unique permitted solution; c) an efficient way to count the number of permitted solutions; d) and an efficient way to implicitly represent the set of all permitted solutions so that each can be efficiently created. As a by-product, we prove that the number of permitted solutions is bounded by 2^k, where k is less than or equal to the number of sites. This is a huge reduction from the number of solutions if we do not require that a solution fit a perfect phylogeny. This implicit representation also allows simple solutions to secondary questions about the genotypes. For example, in the case that the tree is not unique, the representation explicitly reveals which phase relationships (if determined) would reduce the number of possible trees, and gives a conceptual solution to the problem of minimizing the number of haplotype pairs needed to be laboratory-determined in order to deduce the correct tree. We also observe a strong phase-transition (no pun intended) in the probability that a given number of genotypes will be sufficient to uniquely determine the tree.
Parts of this work are joint with different collaborators, including R.H. Chung, V. Bafna, G. Lancia, S. Orzack and S. Yooseph
Abstract:
Transcriptional activation in eukaryotic organisms normally requires
combinatorial interactions of multiple transcription factors. Though
several methods exist for identification of individual protein binding
site patterns in DNA sequences, there are few methods for the
discovery of binding site patterns for cooperatively acting factors. A
new method, Co-Bind (for COoperative BINDing) has been developed for
this purpose (1). This method and some results will be discussed.
From a set of muscle genes in C. elegans we identified several putative muscle-specific regulatory elements using DNA pattern recognition methods. Experimental verification of these regulatory motifs is being pursued (2). Through computational and experimental methods we have recently identified a new transcription regulatory element involved in C. elegans in heat-shock response that seem to act in cooperation with the previously well established HSF regulatory factor (3). Some data from these projects will be presented.
References:
Markstein et al., Genome-wide analysis of clustered Dorsal binding site identifies putative target genes in Drosophila embryo
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology
|
Department of Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to cse590cb-webmaster@cs.washington.edu] | |