As far as a description of what I did to come up with this data, I didn't do much different than what was outlined in the homework. Probably the judgement of my own that I inserted was that I only picked COG's that had a very high number of proteins with E-values less than e-50. For the two sets I did, I could include another 10 proteins in the set, but due to time constraints I chose not to. I don't know if this helped or not, but I only ran into a couple cases where the length of the noncoding upstream region was less than 50 bp's, and since I had a bunch more to use in the set I just discarded it and kept going.