CSE 590TV: Computational Biology
Homework 5
February 12, 2003
Reading
Blanchette, M., Schwikowski, B., and Tompa, M. (2002)
Algorithms for Phylogenetic Footprinting.
Journal of Computational Biology 9:211-223.
Due February 19
In this homework you will use FootPrinter
instead of global multiple alignment tools in your search for
regulatory elements. If you are running Linux, I encourage you to
download your own copy of FootPrinter rather than running on the web
server. In particular, our poor web server may get bogged down or crash if
30 of you try to use it simultaneously the night before the homework
is due. Whichever you use, choose the HTML output option.
Use your 2-3 data sets of well-aligned proteins
and corresponding noncoding upstream sequences produced in Homework 3. As in
Homework 4, if any upstream sequence in your sets is longer than 300
bp, truncate it to just the 300 bp at the 3' end, that is, retain only
the 300 bp closest to the start codon of its gene. For each of your
2-3 data sets, perform the following steps:
-
You will need a phylogenetic gene tree in "bracket" notation as an
input for FootPrinter. You can get this from CLUSTALW. Run CLUSTALW
again on the proteins in your data set (the way you did in Homework 3 to
produce the nice multiple alignment), but before aligning go to the
bottom of the CLUSTALW input form and answer "Yes" to "Show tree in
PHYLIP (bracket) notation". Near the bottom of the output page you
will find a parenthesized representation of the gene tree. Copy this
to a file.
-
Use FootPrinter to
search for conserved motifs in your noncoding upstream sequences. One
of the inputs you will need is the gene tree from step 1 above. You
will have to experiment with the other parameters in order to find an
appealing set of motifs. Your goal is to find a small set of motifs
(say, 1-5 motifs) that each occur in most of the sequences, such that
the motifs occur in the same order in most of the sequences; this
would be a success, and you may not be able to succeed with all of
your data sets. Motifs should not be "low complexity", e.g.,
consisting almost entirely of T's. There are some online guidelines
to help you adjust your parameters. Any motif found is a plausible
candidate as a regulatory element for the downstream gene, that is,
the binding site of some protein that regulates the gene's expression.
-
When you are satisfied with the motifs, turn in the URL for
FootPrinter's HTML output. (If you are using the FootPrinter web
server, this is the URL of FootPrinter's HTML output, but you should
also use your browser's "Save as" function to save a backup copy for
yourself locally in case something later goes wrong with the server's
copy; this backup copy will be missing the images at the top of the
page. If you are using a downloaded version of FootPrinter, make a
local web page with your output and turn in its URL; in order to get
the images at the top of the page, you will need the .fagif directory
FootPrinter produces.) Report on the differences between motifs found
in this homework and Homework 4: for each motif found by FootPrinter,
did CLUSTALW and DIALIGN succeed in aligning its instances? For each
motif you claimed in Homework 4, did FootPrinter succeed in finding
it? Be sure to include the names of the prokaryotes, the gene
identification numbers, and a listing of FootPrinter's parameter
values used (from the FootPrinter "Command line").
tompa@cs.washington.edu
(Last Update:
)