HW3 --- I used a simple brute force greedy algorithm to grow protein sets from COGs. The COG is first filtered by removing all proteins which do not have an upstream non-coding sequence of at least 50 bp. The protein set is initially seeded with a protein in the COG from a particular prokaryote (Thermoplasma acidophilum in my case). Then, the CLUSTALW alignment is computed between the initial protein set and each remaining protein in the COG. The protein in the COG which best aligns with the set is merged in and the process is repeated until the best alignment score falls under a given threshold (0.5). Each symbol in an alignment indicating a conservation is given a value (* = 1.0, : = 0.5, . = 0.3). The score of an alignment is computed by summing up the values of the conservation symbols and then dividing by the length of the alignment to prevent biasing long proteins.