CSE 590TV: Computational Biology
Homework 5
February 12, 2003

Reading

Blanchette, M., Schwikowski, B., and Tompa, M. (2002) Algorithms for Phylogenetic Footprinting. Journal of Computational Biology 9:211-223.

Due February 19

In this homework you will use FootPrinter instead of global multiple alignment tools in your search for regulatory elements. If you are running Linux, I encourage you to download your own copy of FootPrinter rather than running on the web server. In particular, our poor web server may get bogged down or crash if 30 of you try to use it simultaneously the night before the homework is due. Whichever you use, choose the HTML output option.

Use your 2-3 data sets of well-aligned proteins and corresponding noncoding upstream sequences produced in Homework 3. As in Homework 4, if any upstream sequence in your sets is longer than 300 bp, truncate it to just the 300 bp at the 3' end, that is, retain only the 300 bp closest to the start codon of its gene. For each of your 2-3 data sets, perform the following steps:

  1. You will need a phylogenetic gene tree in "bracket" notation as an input for FootPrinter. You can get this from CLUSTALW. Run CLUSTALW again on the proteins in your data set (the way you did in Homework 3 to produce the nice multiple alignment), but before aligning go to the bottom of the CLUSTALW input form and answer "Yes" to "Show tree in PHYLIP (bracket) notation". Near the bottom of the output page you will find a parenthesized representation of the gene tree. Copy this to a file.
  2. Use FootPrinter to search for conserved motifs in your noncoding upstream sequences. One of the inputs you will need is the gene tree from step 1 above. You will have to experiment with the other parameters in order to find an appealing set of motifs. Your goal is to find a small set of motifs (say, 1-5 motifs) that each occur in most of the sequences, such that the motifs occur in the same order in most of the sequences; this would be a success, and you may not be able to succeed with all of your data sets. Motifs should not be "low complexity", e.g., consisting almost entirely of T's. There are some online guidelines to help you adjust your parameters. Any motif found is a plausible candidate as a regulatory element for the downstream gene, that is, the binding site of some protein that regulates the gene's expression.
  3. When you are satisfied with the motifs, turn in the URL for FootPrinter's HTML output. (If you are using the FootPrinter web server, this is the URL of FootPrinter's HTML output, but you should also use your browser's "Save as" function to save a backup copy for yourself locally in case something later goes wrong with the server's copy; this backup copy will be missing the images at the top of the page. If you are using a downloaded version of FootPrinter, make a local web page with your output and turn in its URL; in order to get the images at the top of the page, you will need the .fagif directory FootPrinter produces.) Report on the differences between motifs found in this homework and Homework 4: for each motif found by FootPrinter, did CLUSTALW and DIALIGN succeed in aligning its instances? For each motif you claimed in Homework 4, did FootPrinter succeed in finding it? Be sure to include the names of the prokaryotes, the gene identification numbers, and a listing of FootPrinter's parameter values used (from the FootPrinter "Command line").


tompa@cs.washington.edu (Last Update: )