CSE 590TV: Computational Biology
Homework 6
March 4, 2003

Reading

Jurka, J. and Batzer, M. A. (1996) "Human Repetitive Elements". Encyclopedia of Molecular Biology and Molecular Medicine, vol. 3, Robert A. Myers (ed.), 240-246.

Due March 12

In this homework you will investigate one family of repetitive elements in the human genome. Look in the file of samples of repeat sequences and locate your last name. Copy that line and the DNA sequence on the next line, which is the consensus sequence for your very own repeat family. These are the sort of long repeats that would cause problems for sequence assembly.

Determine the number of substrings in the human genome that are at least (a) 98%, (b) 95%, (c) 90%, and (d) 80% identical to your consensus sequence. If you are using Human genome BLAST for this task, be sure to turn off filtering: the purpose of the filter is to omit repetitive elements from the output, exactly what you do not want done.

Construct a profile (frequency of each nucleotide at each position, as in Table 7.2 in the lecture notes) for the collection of substrings at least 80% identical to your consensus.


tompa@cs.washington.edu (Last Update: )