As far as a description of what I did to come up with this data, I didn't
do much different than what was outlined in the homework.  Probably the
judgement of my own that I inserted was that I only picked COG's that had
a very high number of proteins with E-values less than e-50.  For the two
sets I did, I could include another 10 proteins in the set, but due to
time constraints I chose not to.  I don't know if this helped or not, but
I only ran into a couple cases where the length of the noncoding upstream
region was less than 50 bp's, and since I had a bunch more to use in the
set I just discarded it and kept going.