Prokaryote: Chlamydophila pneumoniae CWL029 Prokaryote Protein ID Desc NC Upstream DNA ----------------------------------------------------------------------------------------------- Protein group 1: Chlamydophila pneumoniae CWL029 15618801 Alanyl tRNA Synthetase 232 Chlamydia trachomatis 15605482 Alanyl tRNA Synthetase 5826 Escherichia coli K12 16130604 alanyl-tRNA synthetase 128 Pseudomonas aeruginosa PA01 15596100 alanyl-tRNA synthetase 121 Mesorhizobium loti 13470354 alanyl-tRNA synthetase 260 Haemophilus influenzae Rd 16272755 alanyl-tRNA synthetase 187 Protein Group 2: Chlamydophila pneumoniae CWL029 15618077 Leucyl tRNA Synthetase 167 Chlamydia trachomatis 15604929 Leucyl tRNA Synthetase 196 Lactococcus lactis subsp lactis 15672798 leucyl-tRNA synthetase 70 Bacillus halodurans 15615843 leucyl-tRNA synthetase 519 Bacillus subtilis 16080084 leucyl-tRNA synthetase 426 Streptococcus pyogenes 15674378 putative leucyl-tRNA synthetase 174 Protein Group 3: Chlamydophila pneumoniae CWL029 15618411 Prolyl tRNA Synthetase 299 Chlamydia trachomatis 15605118 Prolyl tRNA Synthetase 200 Thermotoga maritima 15643280 prolyl-tRNA synthetase 71 Escherichia coli O157:H7 EDL933 15799876 proline tRNA synthetase 110 Pseudomonas aeruginosa PA01 15596153 prolyl-tRNA synthetase 107 Escherichia coli K12, 16128187 proline tRNA synthetase 111 Pasteurella multocida 15603235 ProS 242
Procedure for finding each of the obove protein groups:
I developed a heuristic for choosing COGs that would be likely to have a high number of closely related (low e-value) proteins, and also likely to have those proteins with at least 50 bp of non-coding upstream DNA. I used this procedure to find all 3 protein groups above. The heuristic is as follows:
From the COG list, choose COGs that contain 4 eukaryotes and 40-50 prokaryotes. Further refine this group by choosing only the COGs that contain proteins from each type of prokaryotic organism. Finally, from this group choose COGs that have "synthetase" in the COG description. At this point, I took a random sample of the remaining COGs and kept the 3 with the lowest e-values for the first 10 proteins.