HW3
---
I used a simple brute force greedy algorithm to grow protein sets from COGs.
The COG is first filtered by removing all proteins which do not have an upstream 
non-coding sequence of at least 50 bp.  The protein set is initially seeded with 
a protein in the COG from a particular prokaryote (Thermoplasma acidophilum in 
my case).  Then, the CLUSTALW alignment is computed between the initial protein 
set and each remaining protein in the COG.  The protein in the COG which best 
aligns with the set is merged in and the process is repeated until the best 
alignment score falls under a given threshold (0.5).

Each symbol in an alignment indicating a conservation is given a value (* = 1.0, 
: = 0.5, . = 0.3).  The score of an alignment is computed by summing up the values of
the conservation symbols and then dividing by the length of the alignment to prevent 
biasing long proteins.