This uses the proteins: 1006797..1007897 + 367 11498732 eno COG0148 AF1132 enolase (eno) 21689..22939 + 417 15678072 COG0148 MTH43 enolase 326762..328042 + 427 15605961 eno COG0148 aq_484 enolase 899543..900826 - 428 15643639 COG0148 TM0877 enolase 3656960..3658249 - 430 15616118 eno COG0148 BH3556 enolase (2-phosphoglycerate dehydratase) A cached version of the amino acid BLAST may be found at http://www.ncbi.nlm.nih.gov/cgi-bin/COG/blux?cog=AF1132&AF1132 hilights: Score E Sequences producing significant alignments: (bits) Value AF1132 712 0.0 MTH43 378 e-105 ... aq_484 310 2e-84 TM0877 305 6e-83 BH3556 293 3e-79 with proteins: >gi|11498732|ref|NP_069961.1| enolase (eno) [Archaeoglobus fulgidus] MAPSGASTGSGEAVVVSPYRYEEIEEEVSKAIIGMSVFDQESVDEALRELDGTDNFSRIGGNFAITASLA VAKAAAEILGLPLYAYVGGVFAKELPYPLGNVIGGGRHAEGSTSIQEFLVIPVGAKTFFEAQRANAAVHK QLKKIFKERGIFAAKGDEGAWAAQISDEQAFEILSEAIQRVEDELGVKVRMGIDVAATELWDGERYVYSD RKLTTEEQIAYMAELADRYDLLYIEDPLHEKDFEGFAELTKQVKCMVCGDDIFVTNPEIIKKGIEVGAAN TVLIKPNQNGTLSGTAKAVKIAKDNGYSVVVSHRSGETEDETLAHLAVAFNAKLIKTGVVGGERISKLNE LIRIEELMDKPRMVMI >gi|15678072|ref|NP_275186.1| enolase [Methanothermobacter thermautotrophicus] MESIIEDVRVRKILDSRGNPTVEVDVITWNGFGRAAAPSGASTGSREVAAFPSGGVDEIITEVEDIISSE LIGMDAVDLQDIDLVLKEIDGTENLSSLGGNTVVAVSMATAKAAASSYNMPLYRFLGGNLATSIPYPLGN MINGGAHAGKNAPDIQEFLVVPVGAEDITEAVFANAAVHKRIRELIQKKDPSFAGGKGDEGGWVPSLSNG DALEIQATACEEVTDELGVEVRPSLDLAASEFWDPEIEKYVYRQENVQKDTGEQIEFVKEIIETYDMYYV EDPLHEGDLEGFAELTSLVGDRCMICGDDIFVTNREILREGIEMGAANAIIIKPNQIGTLTDTYLTVKLA LENRYTPVVSHRSGETTDDTIAHLAVAFGAPLIKTGAIGGERIAKLNELIRIQEEIPYSRMADLPF >gi|15605961|ref|NP_213338.1| enolase [Aquifex aeolicus] MSRIKRVHGREVLDSRGNPTVEVEVELESGALGRAIVPSGASTGEREALELRDGDPKRYLGKGVLKAVDN VNGVIAKALVGLEPYNQREIDQILIELDGTENKSKLGANAILGTSMAVARAAANELGIPLYEYLGGKFGY RLPVPLMNVINGGAHADNNLDIQEFMIVPVCGGAFREALRAGVETFHHLKKILKEKGYSTNVGDEGGFAP NLNSSEEALDILMQAIEKAGYKPGEDILLALDVASSEFYENGVYKFEGKERSAEEMIEFYEKLIQKYPII SIEDPMSENDWEGWKEITKRLGDKVQLVGDDLFTTNPKILRKGIEEGVANAILVKLNQIGTVSETLDTVM LAKERNYSAIISHRSGETEDTFISHLAVATNAGQIKTGSASRTDRIAKYNELLRIEERLGNGAVFWGREE FYRFTS >gi|15643639|ref|NP_228685.1| enolase [Thermotoga maritima] MEIVDVRAREVLDSRGNPTVEAEVVLEDGTMGRAIVPSGASTGKFEALEIRDKDKKRYLGKGVLKAVENV NETIAPALIGMNAFDQPLVDKTLIELDGTENKSKLGANAILAVSMAVARAAANYLGLPLYKYLGGVNAKV LPVPLMNVINGGQHADNNLDLQEFMIVPAGFDSFREALRAGAEIFHTLKKILHEAGHVTAVGDEGGFAPN LSSNEEAIKVLIEAIEKAGYKPGEEVFIALDCAASSFYDEEKGVYYVDGEEKSSEVLMGYYEELVAKYPI ISIEDPFAEEDWDAFVEFTKRVGNKVQIVGDDLYVTNVKRLSKGIELKATNSILIKLNQIGTVTETLDAV EMAQKNNMTAIISHRSGESEDTFIADLAVATNAGFIKTGSLSRSERIAKYNQLLRIEEELGKVAEFRGLK SFYSIKR >gi|15616118|ref|NP_244423.1| enolase (2-phosphoglycerate dehydratase) [Bacillus halodurans] MTIITDVYAREVLDSRGNPTVEVEVYLESGAMGRALVPSGASTGEYEAVELRDGGERFLGKGVLKAVENV NEVIAPELIGFDALDQIGIDQHMIELDGTENKGKLGANAILGVSMAVARAAANALDLPLYVYLGGFNAKQ LPVPMMNIINGGEHADNNVDIQEFMIMPVGAESFKEALRTGTEIFHSLKKVLKSKGYNTAVGDEGGFAPN LSSNEEALQTIIEAIEQAGYTPGEQVKLAMDVASSELYNKEDGKYHLSGEGKVLSSEEMVAFYEELVAKY PIISIEDGLDENDWEGHKMLTDRLGDKVQLVGDDLFVTNTKKLAQGIEQGVGNSILIKVNQIGTLTETFD AIEMAKRAGYTAVISHRSGETEDSTIADIAVATNAGQIKTGAPSRTDRVAKYNQLLRIEDELGNLAQYNG LQSFYNLKK with clustalW output: Please wait ... CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: gi|11498732|ref|NP_069961.1| 366 aa Sequence 2: gi|15678072|ref|NP_275186.1| 416 aa Sequence 3: gi|15605961|ref|NP_213338.1| 426 aa Sequence 4: gi|15643639|ref|NP_228685.1| 427 aa Sequence 5: gi|15616118|ref|NP_244423.1| 429 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 53 Sequences (2:3) Aligned. Score: 43 Sequences (4:5) Aligned. Score: 68 Sequences (3:4) Aligned. Score: 67 Sequences (1:3) Aligned. Score: 44 Sequences (2:4) Aligned. Score: 43 Sequences (3:5) Aligned. Score: 67 Sequences (1:4) Aligned. Score: 46 Sequences (2:5) Aligned. Score: 44 Sequences (1:5) Aligned. Score: 43 Guide tree file created: [clustal.dnd] Start of Multiple Alignment There are 4 groups Aligning... Group 1: Sequences: 2 Score:7731 Group 2: Sequences: 3 Score:7751 Group 3: Sequences: 2 Score:6063 Group 4: Sequences: 5 Score:5928 Alignment Score 12023 CLUSTAL-Alignment file created [clustal.aln] -------------------------------------------------------------------------------- Clustal output CLUSTAL W (1.81) multiple sequence alignment gi|15605961|ref|NP_213338.1| -MSRIKRVHGREVLDSRGNPTVEVEVELESGALGRAIVPSGASTGEREAL gi|15616118|ref|NP_244423.1| -MTIITDVYAREVLDSRGNPTVEVEVYLESGAMGRALVPSGASTGEYEAV gi|15643639|ref|NP_228685.1| --MEIVDVRAREVLDSRGNPTVEAEVVLEDGTMGRAIVPSGASTGKFEAL gi|11498732|ref|NP_069961.1| ------------------------------------MAPSGASTGSGEAV gi|15678072|ref|NP_275186.1| MESIIEDVRVRKILDSRGNPTVEVDVITWN-GFGRAAAPSGASTGSREVA .*******. *. gi|15605961|ref|NP_213338.1| ELRDGDPKRYLGKGVLKAVDNVNGVIAKALVGLEPYNQREIDQILIELDG gi|15616118|ref|NP_244423.1| ELRDGG-ERFLGKGVLKAVENVNEVIAPELIGFDALDQIGIDQHMIELDG gi|15643639|ref|NP_228685.1| EIRDKDKKRYLGKGVLKAVENVNETIAPALIGMNAFDQPLVDKTLIELDG gi|11498732|ref|NP_069961.1| VVSP------------YRYEEIEEEVSKAIIGMSVFDQESVDEALRELDG gi|15678072|ref|NP_275186.1| AFPSGG--------VDEIITEVEDIISSELIGMDAVDLQDIDLVLKEIDG . ::: :: ::*:. : :* : *:** gi|15605961|ref|NP_213338.1| TENKSKLGANAILGTSMAVARAAANELGIPLYEYLGGKFGYRLPVPLMNV gi|15616118|ref|NP_244423.1| TENKGKLGANAILGVSMAVARAAANALDLPLYVYLGGFNAKQLPVPMMNI gi|15643639|ref|NP_228685.1| TENKSKLGANAILAVSMAVARAAANYLGLPLYKYLGGVNAKVLPVPLMNV gi|11498732|ref|NP_069961.1| TDNFSRIGGNFAITASLAVAKAAAEILGLPLYAYVGGVFAKELPYPLGNV gi|15678072|ref|NP_275186.1| TENLSSLGGNTVVAVSMATAKAAASSYNMPLYRFLGGNLATSIPYPLGNM *:* . :*.* : .*:*.*:***. .:*** ::** . :* *: *: gi|15605961|ref|NP_213338.1| INGGAHADN-NLDIQEFMIVPVCGGAFREALRAGVETFHHLKKILKEK-- gi|15616118|ref|NP_244423.1| INGGEHADN-NVDIQEFMIMPVGAESFKEALRTGTEIFHSLKKVLKSK-- gi|15643639|ref|NP_228685.1| INGGQHADN-NLDLQEFMIVPAGFDSFREALRAGAEIFHTLKKILHEA-- gi|11498732|ref|NP_069961.1| IGGGRHAEG-STSIQEFLVIPVGAKTFFEAQRANAAVHKQLKKIFKER-- gi|15678072|ref|NP_275186.1| INGGAHAGKNAPDIQEFLVVPVGAEDITEAVFANAAVHKRIRELIQKKDP *.** ** .:***:::*. : ** :.. .: ::::::. gi|15605961|ref|NP_213338.1| GYSTNVGDEGGFAPNLNSSEEALDILMQAIEKAGYKPGEDILLALDVASS gi|15616118|ref|NP_244423.1| GYNTAVGDEGGFAPNLSSNEEALQTIIEAIEQAGYTPGEQVKLAMDVASS gi|15643639|ref|NP_228685.1| GHVTAVGDEGGFAPNLSSNEEAIKVLIEAIEKAGYKPGEEVFIALDCAAS gi|11498732|ref|NP_069961.1| GIFAAKGDEGAWAAQIS-DEQAFEILSEAIQRVEDELGVKVRMGIDVAAT gi|15678072|ref|NP_275186.1| SFAGGKGDEGGWVPSLS-NGDALEIQATACEEVTDELGVEVRPSLDLAAS . ****.:...:. . :*:. * :.. * .: .:* *:: gi|15605961|ref|NP_213338.1| EFY--ENGVYKF--EGKERSAEEMIEFYEKLIQKYPIISIEDPMSENDWE gi|15616118|ref|NP_244423.1| ELYNKEDGKYHLSGEGKVLSSEEMVAFYEELVAKYPIISIEDGLDENDWE gi|15643639|ref|NP_228685.1| SFYDEEKGVYYV--DGEEKSSEVLMGYYEELVAKYPIISIEDPFAEEDWD gi|11498732|ref|NP_069961.1| ELWD--GERYVYSDR--KLTTEEQIAYMAELADRYDLLYIEDPLHEKDFE gi|15678072|ref|NP_275186.1| EFWDPEIEKYVYRQENVQKDTGEQIEFVKEIIETYDMYYVEDPLHEGDLE .:: * : : : :: * : :** : * * : gi|15605961|ref|NP_213338.1| GWKEITKRLGDKVQLVGDDLFTTNPKILRKGIEEGVANAILVKLNQIGTV gi|15616118|ref|NP_244423.1| GHKMLTDRLGDKVQLVGDDLFVTNTKKLAQGIEQGVGNSILIKVNQIGTL gi|15643639|ref|NP_228685.1| AFVEFTKRVGNKVQIVGDDLYVTNVKRLSKGIELKATNSILIKLNQIGTV gi|11498732|ref|NP_069961.1| GFAELTKQV--KCMVCGDDIFVTNPEIIKKGIEVGAANTVLIKPNQNGTL gi|15678072|ref|NP_275186.1| GFAELTSLVGDRCMICGDDIFVTNREILREGIEMGAANAIIIKPNQIGTL . :*. : : : ***::.** : : :*** . *::::* ** **: gi|15605961|ref|NP_213338.1| SETLDTVMLAKERNYSAIISHRSGETEDTFISHLAVATNAGQIKTGSASR gi|15616118|ref|NP_244423.1| TETFDAIEMAKRAGYTAVISHRSGETEDSTIADIAVATNAGQIKTGAPSR gi|15643639|ref|NP_228685.1| TETLDAVEMAQKNNMTAIISHRSGESEDTFIADLAVATNAGFIKTGSLSR gi|11498732|ref|NP_069961.1| SGTAKAVKIAKDNGYSVVVSHRSGETEDETLAHLAVAFNAKLIKTGVVG- gi|15678072|ref|NP_275186.1| TDTYLTVKLALENRYTPVVSHRSGETTDDTIAHLAVAFGAPLIKTGAIG- : * :: :* : ::******: * ::.:*** .* **** . gi|15605961|ref|NP_213338.1| TDRIAKYNELLRIEERLGNGAVFWGREEFYRFTS gi|15616118|ref|NP_244423.1| TDRVAKYNQLLRIEDELGNLAQYNGLQSFYNLKK gi|15643639|ref|NP_228685.1| SERIAKYNQLLRIEEELGKVAEFRGLKSFYSIKR gi|11498732|ref|NP_069961.1| GERISKLNELIRIEELMD-------KPRMVMI-- gi|15678072|ref|NP_275186.1| GERIAKLNELIRIQEEIP-------YSRMADLPF :*::* *:*:**:: : : : -------------------------------------------------------------------------------- Tree construction CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: gi|15605961|ref|NP_213338.1| 434 aa Sequence 2: gi|15616118|ref|NP_244423.1| 434 aa Sequence 3: gi|15643639|ref|NP_228685.1| 434 aa Sequence 4: gi|11498732|ref|NP_069961.1| 434 aa Sequence 5: gi|15678072|ref|NP_275186.1| 434 aa Phylogenetic tree file created: [clustal.ph] +----------------1:gi|15605961|ref|NP_213338.1| +--7 | +----------------2:gi|15616118|ref|NP_244423.1| --6 | +----------------3:gi|15643639|ref|NP_228685.1| +--8 | +------------------------4:gi|11498732|ref|NP_069961.1| +-----------------9 +-------------------------5:gi|15678072|ref|NP_275186.1| Midpoint-rooted tree: +------------------------1:gi|15605961|ref|NP_213338.1| +--7 +------------------8 +------------------------2:gi|15616118|ref|NP_244423.1| | | --6 +------------------------3:gi|15643639|ref|NP_228685.1| | | +------------------------------------4:gi|11498732|ref|NP_069961.1| +------9 +-------------------------------------5:gi|15678072|ref|NP_275186.1| Remember, this is an unrooted tree! -------------------------------------------------------------------------------- ( ( gi|15605961|ref|NP_213338.1|:0.15745, gi|15616118|ref|NP_244423.1|:0.15785) :0.00495, gi|15643639|ref|NP_228685.1|:0.15703, ( gi|11498732|ref|NP_069961.1|:0.22722, gi|15678072|ref|NP_275186.1|:0.23453) :0.16372); -------------------------------------------------------------------------------- Bootstrapping tree CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: gi|15605961|ref|NP_213338.1| 434 aa Sequence 2: gi|15616118|ref|NP_244423.1| 434 aa Sequence 3: gi|15643639|ref|NP_228685.1| 434 aa Sequence 4: gi|11498732|ref|NP_069961.1| 434 aa Sequence 5: gi|15678072|ref|NP_275186.1| 434 aa Phylogenetic tree file created: [clustal.ph] No bootstrap generated This is probably caused by too few input sequences. CLUSTAL needs more than three sequences in order to be able to calculate a phylogenetic tree. -------------------------------------------------------------------------------- RunClustalW--V1.4--21-Oct-2002/JackL This gives the following upstream regions (corrected as necessary for negative strand): >AF1132 AGGTGGTAGCTTGATAATTGAGGACGTTCACTACAGAGTCGTTTTCGACAGCA GGGGAAACGAGACGGTTGAGTGTGAAGTCGTTGCTGGGGAGGTTGTTGCGAAGGCGA >MTH43 AACTCTAATAAACCACTATTTTTCAGTCTAGAGTTTATTATGAGGTGTTTTAA >aq_484 ACCTTGTAGTCGTCCCAG TATCCTAACATGGGAACCCAGTCGGGTATCCTGTCCTTTTCTTCTATGAAGTACCAAAGAGCTGCTGTAA AATCTTCCCTTGCCTCCTCACTCAGGTCAAAGTTAGGGTCGGAGAGTATCCTGAAGAGGAGTTTGGCGTC CAGAATCAGGTTTCTGACGTACTCCATGGTGGGAGGGACTTTTGCGAGTTTCCTTCTAAATGCGTTCTGT AATTTATAGAAATCCCTGTATTCCTCTCCTTTGAACCTTTCCAGGTACTTTTCCATAGTAAGATAGTATA CA >TM0877 ACATAAAAGGGTACAATAGAACCTTGGTGGATCAAAAAATCAAGGAGGTGGAAACTGT GTACG >BH3556 AA CGTTTGTTTAAGGTTCGATCAAGGGGCAGGATTTCCATCCCGTACCTTGTAAGATTAAAAAGAAGAACAG AAAAAAGGAGATGGATTTGAGA >ArchaeoglobusFulgidus AGGTGGTAGCTTGATAATTGAGGACGTTCACTACAGAGTCGTTTTCGACAGCA GGGGAAACGAGACGGTTGAGTGTGAAGTCGTTGCTGGGGAGGTTGTTGCGAAGGCGA >MethanothermobacterThermautotrophicus AACTCTAATAAACCACTATTTTTCAGTCTAGAGTTTATTATGAGGTGTTTTAA >AquifexAeolicus ACCTTGTAGTCGTCCCAG TATCCTAACATGGGAACCCAGTCGGGTATCCTGTCCTTTTCTTCTATGAAGTACCAAAGAGCTGCTGTAA AATCTTCCCTTGCCTCCTCACTCAGGTCAAAGTTAGGGTCGGAGAGTATCCTGAAGAGGAGTTTGGCGTC CAGAATCAGGTTTCTGACGTACTCCATGGTGGGAGGGACTTTTGCGAGTTTCCTTCTAAATGCGTTCTGT AATTTATAGAAATCCCTGTATTCCTCTCCTTTGAACCTTTCCAGGTACTTTTCCATAGTAAGATAGTATA CA >ThermotogaMaritima ACATAAAAGGGTACAATAGAACCTTGGTGGATCAAAAAATCAAGGAGGTGGAAACTGT GTACG >BacillusHalodurans AA CGTTTGTTTAAGGTTCGATCAAGGGGCAGGATTTCCATCCCGTACCTTGTAAGATTAAAAAGAAGAACAG AAAAAAGGAGATGGATTTGAGA ( ( AquifexAeolicus:0.15745, BacillusHalodurans:0.15785) :0.00495, ThermotogaMaritima:0.15703, ( ArchaeoglobusFulgidus:0.22722, MethanothermobacterThermautotrophicus:0.23453) :0.16372); Using the phylogenetic tree above does not yeild any results. So instead we adapt ourselves to use the standard tree: >ARCHAEOGLOBUS AGGTGGTAGCTTGATAATTGAGGACGTTCACTACAGAGTCGTTTTCGACAGCA GGGGAAACGAGACGGTTGAGTGTGAAGTCGTTGCTGGGGAGGTTGTTGCGAAGGCGA >METHANOBACTERIUM AACTCTAATAAACCACTATTTTTCAGTCTAGAGTTTATTATGAGGTGTTTTAA >AQUIFEX ACCTTGTAGTCGTCCCAG TATCCTAACATGGGAACCCAGTCGGGTATCCTGTCCTTTTCTTCTATGAAGTACCAAAGAGCTGCTGTAA AATCTTCCCTTGCCTCCTCACTCAGGTCAAAGTTAGGGTCGGAGAGTATCCTGAAGAGGAGTTTGGCGTC CAGAATCAGGTTTCTGACGTACTCCATGGTGGGAGGGACTTTTGCGAGTTTCCTTCTAAATGCGTTCTGT AATTTATAGAAATCCCTGTATTCCTCTCCTTTGAACCTTTCCAGGTACTTTTCCATAGTAAGATAGTATA CA >THERMOTOGA ACATAAAAGGGTACAATAGAACCTTGGTGGATCAAAAAATCAAGGAGGTGGAAACTGT GTACG >BACILLUS AA CGTTTGTTTAAGGTTCGATCAAGGGGCAGGATTTCCATCCCGTACCTTGTAAGATTAAAAAGAAGAACAG AAAAAAGGAGATGGATTTGAGA There seem to be very few motifs in this output. We have to be very very generous (see #of mutations)with the parameters in order to get any motifs whatsoever. Unfortunately if we allow regulatory element loss, the ArchaeoglobusFulgidus (my prokaryote) normally gets lost. Since this is not interesting in the context of this homework, I do not allow regulatory element loss. Using the following parameters: -------------------------------------------------------------------------------- FootPrinter 2.0 Web server You need help adjusting the parameters of FootPrinter? Click here! Questions? Ask the FootPrinter Master... Paste input sequences (Fasta format) >ARCHAEOGLOBUS AGGTGGTAGCTTGATAATTGAGGACGTTCACTACAGAGTCGTTTTCGACAGCA GGGGAAACGAGACGGTTGAGTGTGAAGTCGTTGCTGGGGAGGTTGTTGCGAAGGCGA >METHANOBACTERIUM AACTCTAATAAACCACTATTTTTCAGTCTAGAGTTTATTATGAGGTGTTTTAA >AQUIFEX ACCTTGTAGTCGTCCCAG TATCCTAACATGGGAACCCAGTCGGGTATCCTGTCCTTTTCTTCTATGAAGTACCAAAGAGCTGCTGTAA AATCTTCCCTTGCCTCCTCACTCAGGTCAAAGTTAGGGTCGGAGAGTATCCTGAAGAGGAGTTTGGCGTC CAGAATCAGGTTTCTGACGTACTCCATGGTGGGAGGGACTTTTGCGAGTTTCCTTCTAAATGCGTTCTGT AATTTATAGAAATCCCTGTATTCCTCTCCTTTGAACCTTTCCAGGTACTTTTCCATAGTAAGATAGTATA CA >THERMOTOGA ACATAAAAGGGTACAATAGAACCTTGGTGGATCAAAAAATCAAGGAGGTGGAAACTGT GTACG >BACILLUS AA CGTTTGTTTAAGGTTCGATCAAGGGGCAGGATTTCCATCCCGTACCTTGTAAGATTAAAAAGAAGAACAG AAAAAAGGAGATGGATTTGAGA Sequence type Upstream sequences Paste your phylogenetic tree (Optional if all your species are in the default tree) Motif size 7 Maximum number of mutations 4 Maximum mutations per branch 1 Subregion change cost 0 Subregions size 20 Allow regulatory element losses? No (ignore next two options) -------------------------------------------------------------------------------- We get the following output: http://abstract.cs.washington.edu/~blanchem/FootPrinterWeb/__webquery__.fasta0.321272511125670.207088483535824.main.html# giving 3 possible motifs. Dialign does not identify these motifs. http://www.genomatix.de/cgi-bin/dialign/dialign.pl?SHOW=user_23_2.seq_61222.html&TASK=dialign I am now using the phosphoglycerate kinase COG0126. This uses the proteins: 1021803..1023026 - 408 11498746 pgk COG0126 AF1146 3-phosphoglycerate kinase (pgk) 570032..571285 - 418 15668822 MJ0641 COG0126 MJ0641 phosphoglycerate kinase (pgk) 520765..521991 + 409 13541361 COG0126 TVN0530 3-phosphoglycerate kinase 1096766..1097998 + 411 14591038 COG0126 PH1218 phosphoglycerate kinase 916872..918068 - 399 15790280 pgk COG0126 VNG1216G glucose-6-phosphate isomerase A cached version of the amino acid BLAST may be found at http://www.ncbi.nlm.nih.gov/cgi-bin/COG/blux?cog=AF1146&AF1146 hilights: Score E Sequences producing significant alignments: (bits) Value AF1146 794 0.0 MJ0641 354 1e-97 ... TVN0530 308 8e-84 ... PH1218 290 3e-78 ... VNG1216G 281 1e-75 with proteins: >gi|11498746|ref|NP_069975.1| 3-phosphoglycerate kinase (pgk) [Archaeoglobus fulgidus] MMIDGLPTLDDIPYRGKHVLLRVDINAPIVNSTILDTSRFESHIPTIEALEDSKLVLLAHQSRPGKKDFT SLESHASTLSKLLGKRVEYIDEIFSKGVLRRIKEMENGEVILLENVRFYSEEQLNRSAEEHAECHMVRKL STAFDLFVNDAFSASHRSHASLVGFVPVLPSVVGRLVENEVTALSKPLKGEGRKIFVLGGAKIKDSVKVL KNVLENNIAEKVVLTGVVANYFLMLKGYDIGEVNRKVVEDNKEDVSDEEMINILKKYSDKIILPIDLGIE KDGVRVDIPLEKFDGKYRIMDIGLETVNQLSEIIPKYDYVVLNGPAGVFEDERFSLGTYEILRAATRAGY SVVGGGHIASAARLFGLSDKFSHISTAGGACIRFLSGEKLVALEVIKEYWAKKWGKS >gi|15668822|ref|NP_247625.1| phosphoglycerate kinase (pgk) [Methanococcus jannaschii] MIMFLTLDDFNFEDKRVVLRVDINCPIDPNTGEILDDKRIREIKSTITELINKGAKVVILAHQSRPGKKD FTTLKNHAKVLSDVIGKEVEYIDEVIGSTAREAIINMKCGDVILLENVRFYSEEVLSDWKKWENITPKKQ AETNLIKRLAPLFDYFVNDAFAAAHRAQPSLVGFSYYMPMIAGRLMEREVGVLSKVLENPEKPCVYVLGG AKADDSIRVMKNVLENGTADKVLTSGIVANIFLVAMGYDLGVNMDIIENLGLKSQIEIAKELLNKFEDKI VVPVDVALNINEERVEADLNKDEKVEHLINDIGEKTIELYSEIINEAKTIVANGPAGVFEKEAFAKGTEE LLKAIANSKGFSVIGGGHLSAAAELFGIADKIDHVSTGGGATLDFLAGEKLPVIEMLKESYKKYKGQ >gi|13541361|ref|NP_111049.1| 3-phosphoglycerate kinase [Thermoplasma volcanium] MADFFLMDSFDLAGRTIYLRVDINSPVNPVTGEIMGTDRFRAHVETIRKLRDSKVVIVAHQSRPGKDDFT SLRQHAQVMSRILNKKVMFVDQLFGSLVNKTVESMNEGDIVMLENARFYSEEVDLTTLESMENSNIVKGL STLFDYYIIDAFAAIHRAQTTLVGFRRIKPNIAGALIEKEVTMIDRFRHLNESPKIAILGGAKIDDSIAV SENFLKSGFVDKILTGGVVANAFLWAKGIDIGKKNRDFIIKNNGDYEKLIAKCKGLLSEFGDRILVPSDF ILSPSGERVSANGKIPDDQILADIGLDTVVEYSEIIDKAKAIFMNGPMGIYEIEAYSSGTREIFSSVAKS EAFSIAGGGHTLSALDKLGLTNRIDHASTGGGALISYLSGEAMPVLEALKESKRLFEV >gi|14591038|ref|NP_143113.1| phosphoglycerate kinase [Pyrococcus horikoshii] MFRLEDFNFHNKTVFLRVDLNSPMKDGKIISDARFKAVLPTIRYLIESGAKVVIGTHQGKPYSEDYTTTE EHARVLSELLDQHVEYIEDIFGRYAREKIKELKSGEVAILENLRFSAEEVKNKPIEECEKTFLVKKLSKV IDYVVNDAFATAHRSQPSLVGFARIKPMIMGFLMEKEIEALMRAYYSKDSPKIYVLGGAKVEDSLKVVEN VLRRERADLVLTGGLVANVFTLAKGFDLGRKNVEFMKKKGLLDYVKHAEEILDEFYPYIRTPVDFAVDYK GERVEIDLLSENRGLLHQYQIMDIGKRTAEKYREILMKARIIVANGPMGVFEREEFAIGTVEVFKAIADS PAFSVLGGGHSIASIQKYGITGITHISTGGGAMLSFFAGEELPVLRALQISYEKFKEVVK >gi|15790280|ref|NP_280104.1| glucose-6-phosphate isomerase; Pgk [Halobacterium sp. NRC-1] MAIRTLDDLAAANRAIGVRVDINSPLTAAGGLADDARLRAHVDTLAELLAADARVAVLAHQGRPGGDEFA RLERHADRLDALLDAPVSYCDATFSTGARDAVADLAPGEAVVLENTRFYSEEYMAFAPERAADTALVDGL APALDAYVNDAFAAAHRSQPSLVGFPEVLPSYAGRVMEAELDALSGVADTPTPRTYVVGGAKVPDSVEVA AHALSHGLADNVLVTGVVANVFLAATGVDLGRASTDFIHERDYGTEIARAADLLAAHNDALHLPVDVAVE RDGARCELSTDALPPAGDEAVCDIGSDTVDAYADVLADSETVVVNGPAGVFEDDLFADGTRGVFDAASEV EHSIVGGGDTAAAIRRFDITGFDHVSTGGGAAINLLTDADLPAVAALR With clustalW output: Please wait ... CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: gi|11498746|ref|NP_069975.1| 407 aa Sequence 2: gi|15668822|ref|NP_247625.1| 417 aa Sequence 3: gi|13541361|ref|NP_111049.1| 408 aa Sequence 4: gi|14591038|ref|NP_143113.1| 410 aa Sequence 5: gi|15790280|ref|NP_280104.1| 398 aa Start of Pairwise alignments Aligning... Sequences (4:5) Aligned. Score: 38 Sequences (3:4) Aligned. Score: 42 Sequences (2:3) Aligned. Score: 45 Sequences (1:2) Aligned. Score: 48 Sequences (3:5) Aligned. Score: 36 Sequences (1:3) Aligned. Score: 40 Sequences (2:4) Aligned. Score: 48 Sequences (1:4) Aligned. Score: 39 Sequences (2:5) Aligned. Score: 41 Sequences (1:5) Aligned. Score: 38 Guide tree file created: [clustal.dnd] Start of Multiple Alignment There are 4 groups Aligning... Group 1: Sequences: 2 Score:6341 Group 2: Sequences: 2 Score:6178 Group 3: Sequences: 4 Score:5831 Group 4: Sequences: 5 Score:5750 Alignment Score 9543 CLUSTAL-Alignment file created [clustal.aln] -------------------------------------------------------------------------------- Clustal output CLUSTAL W (1.81) multiple sequence alignment gi|11498746|ref|NP_069975.1| MMIDGLPTLDDIPYRGKHVLLRVDINAPIVNST--ILDTSRFESHIPTIE gi|15668822|ref|NP_247625.1| -MIMFL-TLDDFNFEDKRVVLRVDINCPIDPNTGEILDDKRIREIKSTIT gi|13541361|ref|NP_111049.1| --MADFFLMDSFDLAGRTIYLRVDINSPVNPVTGEIMGTDRFRAHVETIR gi|14591038|ref|NP_143113.1| -----MFRLEDFNFHNKTVFLRVDLNSPMK--DGKIISDARFKAVLPTIR gi|15790280|ref|NP_280104.1| ---MAIRTLDDLAAANRAIGVRVDINSPLTAAGG-LADDARLRAHVDTLA : ::.: .: : :***:*.*: : . *:. *: gi|11498746|ref|NP_069975.1| ALEDS--KLVLLAHQSRPGKKDFTSLESHASTLSKLLGKRVEYIDEIFSK gi|15668822|ref|NP_247625.1| ELINKGAKVVILAHQSRPGKKDFTTLKNHAKVLSDVIGKEVEYIDEVIGS gi|13541361|ref|NP_111049.1| KLRDS--KVVIVAHQSRPGKDDFTSLRQHAQVMSRILNKKVMFVDQLFGS gi|14591038|ref|NP_143113.1| YLIESGAKVVIGTHQGKPYSEDYTTTEEHARVLSELLDQHVEYIEDIFGR gi|15790280|ref|NP_280104.1| ELLAADARVAVLAHQGRPGGDEFARLERHADRLDALLDAPVSYCDATFST * ::.: :**.:* .::: . ** :. ::. * : : :. gi|11498746|ref|NP_069975.1| GVLRRIKEMENGEVILLENVRFYSEEQL-------NRSAEEHAECHMVRK gi|15668822|ref|NP_247625.1| TAREAIINMKCGDVILLENVRFYSEEVLSDWKKWENITPKKQAETNLIKR gi|13541361|ref|NP_111049.1| LVNKTVESMNEGDIVMLENARFYSEEVD-------LTTLESMENSNIVKG gi|14591038|ref|NP_143113.1| YAREKIKELKSGEVAILENLRFSAEEVK-------NKPIEECEKTFLVKK gi|15790280|ref|NP_280104.1| GARDAVADLAPGEAVVLENTRFYSEEYM-------AFAPERAADTALVDG . : .: *: :*** ** :** . : . :: gi|11498746|ref|NP_069975.1| LSTAFDLFVNDAFSASHRSHASLVGFVPVLPSVVGRLVENEVTALSKPLK gi|15668822|ref|NP_247625.1| LAPLFDYFVNDAFAAAHRAQPSLVGFSYYMPMIAGRLMEREVGVLSKVLE gi|13541361|ref|NP_111049.1| LSTLFDYYIIDAFAAIHRAQTTLVGFRRIKPNIAGALIEKEVTMIDRFRH gi|14591038|ref|NP_143113.1| LSKVIDYVVNDAFATAHRSQPSLVGFARIKPMIMGFLMEKEIEALMRAYY gi|15790280|ref|NP_280104.1| LAPALDAYVNDAFAAAHRSQPSLVGFPEVLPSYAGRVMEAELDALSGVAD *: :* : ***:: **::.:**** * * ::* *: : gi|11498746|ref|NP_069975.1| G-EGRKIFVLGGAKIKDSVKVLKNVLENNIAEKVVLTGVVANYFLMLKGY gi|15668822|ref|NP_247625.1| NPEKPCVYVLGGAKADDSIRVMKNVLENGTADKVLTSGIVANIFLVAMGY gi|13541361|ref|NP_111049.1| LNESPKIAILGGAKIDDSIAVSENFLKSGFVDKILTGGVVANAFLWAKGI gi|14591038|ref|NP_143113.1| SKDSPKIYVLGGAKVEDSLKVVENVLRRERADLVLTGGLVANVFTLAKGF gi|15790280|ref|NP_280104.1| T-PTPRTYVVGGAKVPDSVEVAAHALSHGLADNVLVTGVVANVFLAATGV ::**** **: * : * .: :: *:*** * * gi|11498746|ref|NP_069975.1| DIGEVNRKVVEDNK--EDVSDEEMINILKKYSDKIILPIDLGIEKDGVRV gi|15668822|ref|NP_247625.1| DLG-VNMDIIENLG--LKSQIEIAKELLNKFEDKIVVPVDVALNINEERV gi|13541361|ref|NP_111049.1| DIGKKNRDFIIKNNGDYEKLIAKCKGLLSEFGDRILVPSDFILSPSGER- gi|14591038|ref|NP_143113.1| DLGRKNVEFMKKKG--LLDYVKHAEEILDEFYPYIRTPVDFAVDYKGERV gi|15790280|ref|NP_280104.1| DLGRASTDFIHERD--YGTEIARAADLLAAHNDALHLPVDVAVERDGARC *:* . ..: . :* . : * *. :. . * gi|11498746|ref|NP_069975.1| D---IPLEKFDG--KYRIMDIGLETVNQLSEIIPKYDYVVLNGPAGVFED gi|15668822|ref|NP_247625.1| E---ADLNKDEKV-EHLINDIGEKTIELYSEIINEAKTIVANGPAGVFEK gi|13541361|ref|NP_111049.1| ----VSANGKIPD-DQILADIGLDTVVEYSEIIDKAKAIFMNGPMGIYEI gi|14591038|ref|NP_143113.1| EIDLLSENRGLLH-QYQIMDIGKRTAEKYREILMKARIIVANGPMGVFER gi|15790280|ref|NP_280104.1| E---LSTDALPPAGDEAVCDIGSDTVDAYADVLADSETVVVNGPAGVFED : . : *** * ::: . :. *** *::* gi|11498746|ref|NP_069975.1| ERFSLGTYEILRAATR-AGYSVVGGGHIASAARLFGLSDKFSHISTAGGA gi|15668822|ref|NP_247625.1| EAFAKGTEELLKAIANSKGFSVIGGGHLSAAAELFGIADKIDHVSTGGGA gi|13541361|ref|NP_111049.1| EAYSSGTREIFSSVAKSEAFSIAGGGHTLSALDKLGLTNRIDHASTGGGA gi|14591038|ref|NP_143113.1| EEFAIGTVEVFKAIADSPAFSVLGGGHSIASIQKYGITG-ITHISTGGGA gi|15790280|ref|NP_280104.1| DLFADGTRGVFDAASE-VEHSIVGGGDTAAAIRRFDITG-FDHVSTGGGA : :: ** :: : : .*: ***. :: .::. : * **.*** gi|11498746|ref|NP_069975.1| CIRFLSGEKLVALEVIKEYWAKKWGKS- gi|15668822|ref|NP_247625.1| TLDFLAGEKLPVIEMLKESYKKYKGQ-- gi|13541361|ref|NP_111049.1| LISYLSGEAMPVLEALKESKRLFEV--- gi|14591038|ref|NP_143113.1| MLSFFAGEELPVLRALQISYEKFKEVVK gi|15790280|ref|NP_280104.1| AINLLTDADLPAVAALR----------- : ::. : .: :: -------------------------------------------------------------------------------- Tree construction CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: gi|11498746|ref|NP_069975.1| 428 aa Sequence 2: gi|15668822|ref|NP_247625.1| 428 aa Sequence 3: gi|13541361|ref|NP_111049.1| 428 aa Sequence 4: gi|14591038|ref|NP_143113.1| 428 aa Sequence 5: gi|15790280|ref|NP_280104.1| 428 aa Phylogenetic tree file created: [clustal.ph] +-----------------------------------1:gi|11498746|ref|NP_069975.1| +--7 | +---------------------------2:gi|15668822|ref|NP_247625.1| --6 | +-------------------------------------3:gi|13541361|ref|NP_111049.1| | +--8 +--9 +----------------------------------4:gi|14591038|ref|NP_143113.1| | +-----------------------------------------5:gi|15790280|ref|NP_280104.1| Midpoint-rooted tree: +--------------------------------------1:gi|11498746|ref|NP_069975.1| +--7 | +-----------------------------2:gi|15668822|ref|NP_247625.1| +--9 | | +----------------------------------------3:gi|13541361|ref|NP_111049.1| --6 +--8 | +--------------------------------------4:gi|14591038|ref|NP_143113.1| | +-------------------------------------------5:gi|15790280|ref|NP_280104.1| Remember, this is an unrooted tree! -------------------------------------------------------------------------------- ( ( gi|11498746|ref|NP_069975.1|:0.28068, gi|15668822|ref|NP_247625.1|:0.22056) :0.02234, ( gi|13541361|ref|NP_111049.1|:0.29515, gi|14591038|ref|NP_143113.1|:0.27485) :0.00597, gi|15790280|ref|NP_280104.1|:0.32480); -------------------------------------------------------------------------------- Bootstrapping tree CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: gi|11498746|ref|NP_069975.1| 428 aa Sequence 2: gi|15668822|ref|NP_247625.1| 428 aa Sequence 3: gi|13541361|ref|NP_111049.1| 428 aa Sequence 4: gi|14591038|ref|NP_143113.1| 428 aa Sequence 5: gi|15790280|ref|NP_280104.1| 428 aa Phylogenetic tree file created: [clustal.ph] No bootstrap generated This is probably caused by too few input sequences. CLUSTAL needs more than three sequences in order to be able to calculate a phylogenetic tree. -------------------------------------------------------------------------------- RunClustalW--V1.4--21-Oct-2002/JackL This gives the following upstream regions (corrected as necessary for negative strand): >ARCHAEOGLOBUS TGAACCAATTCTGAGCAGGTAAATTAAAATTTTCAGAAAGTT TTTAATTTCACCTTCCGAATTGAAG >METHANOCOCCUS ATCACATCAGTTATTAAAATTAACTTAATAATTATTTAAGATTTCT TTATATTTATTCTTTCTGCAAAAACCTTAAAAACTTTAAAATGATAATTAGGAAATATCTAAGAAAAGTT TCTACAAATGACGATAATCTATTAAAACTTCTAAAAACATAAAAATCTTAGA >THERMOPLASMA TGCACTGGTAAGATCAATTCATTAAATTACTTTTCTGCCATA AAAATAAATTTAATAATAGTCCATATTAAGGATCA >PYROCOCCUS ATTGCATT CCTTTCTATTCTTATCTTCCCCTATATTGGCATAATATTCTTAACCTTCGTTTTTTATTGAAATTTGGTG GTGAAA >HALOBACTERIUM AACGAGCCGCCTGCACGCACCTTTACTCGGTCCCGAGTACTGGCGTTCGGG GCGGCTGTCCGTACGGAAACGCATTTACAGGCACACGCAGTGGCTCCGATACA and ( ( ARCHAEOGLOBUS:0.28068, METHANOCOCCUS:0.22056) :0.02234, ( THERMOPLASMA:0.29515, PYROCOCCUS:0.27485) :0.00597, HALOBACTERIUM:0.32480); Using the default tree, we get a larger number of motifs in the output than Enolase, I can be a bit more choosy. -------------------------------------------------------------------------------- FootPrinter 2.0 Web server You need help adjusting the parameters of FootPrinter? Click here! Questions? Ask the FootPrinter Master... Paste input sequences (Fasta format) (See this example) >ARCHAEOGLOBUS TGAACCAATTCTGAGCAGGTAAATTAAAATTTTCAGAAAGTT TTTAATTTCACCTTCCGAATTGAAG >METHANOCOCCUS ATCACATCAGTTATTAAAATTAACTTAATAATTATTTAAGATTTCT TTATATTTATTCTTTCTGCAAAAACCTTAAAAACTTTAAAATGATAATTAGGAAATATCTAAGAAAAGTT TCTACAAATGACGATAATCTATTAAAACTTCTAAAAACATAAAAATCTTAGA >THERMOPLASMA TGCACTGGTAAGATCAATTCATTAAATTACTTTTCTGCCATA AAAATAAATTTAATAATAGTCCATATTAAGGATCA >PYROCOCCUS ATTGCATT CCTTTCTATTCTTATCTTCCCCTATATTGGCATAATATTCTTAACCTTCGTTTTTTATTGAAATTTGGTG GTGAAA >HALOBACTERIUM AACGAGCCGCCTGCACGCACCTTTACTCGGTCCCGAGTACTGGCGTTCGGG GCGGCTGTCCGTACGGAAACGCATTTACAGGCACACGCAGTGGCTCCGATACA Or Upload a sequence file in Fasta Format: Sequence type Upstream sequences Downstream sequences Other Paste your phylogenetic tree (Optional if all your species are in the default tree) (See this small example, or the full tree) Or Upload a tree file: Motif size 7 Maximum number of mutations 4 Maximum mutations per branch 1 Subregion change cost 0 Subregions size 100 Allow regulatory element losses? No (ignore next two options) Spanned tree significance level Significant Motif loss cost 1 Press to submit information, or to clear fields. -------------------------------------------------------------------------------- http://abstract.cs.washington.edu/~blanchem/FootPrinterWeb/__webquery__.fasta0.06602635476471620.382073836905089.main.html# giving 3 possible motifs. It is interesting to note that 2 of them happen in the same order for all of the prokaryotes. Dialign does not find these. http://www.genomatix.de/cgi-bin/dialign/dialign.pl?SHOW=user_23_3.seq_31378.html&TASK=dialign Using the phylogenetic tree produced by ClustalW, we get a much sparser list of motifs: -------------------------------------------------------------------------------- FootPrinter 2.0 Web server You need help adjusting the parameters of FootPrinter? Click here! Questions? Ask the FootPrinter Master... Paste input sequences (Fasta format) (See this example) >ARCHAEOGLOBUS TGAACCAATTCTGAGCAGGTAAATTAAAATTTTCAGAAAGTT TTTAATTTCACCTTCCGAATTGAAG >METHANOCOCCUS ATCACATCAGTTATTAAAATTAACTTAATAATTATTTAAGATTTCT TTATATTTATTCTTTCTGCAAAAACCTTAAAAACTTTAAAATGATAATTAGGAAATATCTAAGAAAAGTT TCTACAAATGACGATAATCTATTAAAACTTCTAAAAACATAAAAATCTTAGA >THERMOPLASMA TGCACTGGTAAGATCAATTCATTAAATTACTTTTCTGCCATA AAAATAAATTTAATAATAGTCCATATTAAGGATCA >PYROCOCCUS ATTGCATT CCTTTCTATTCTTATCTTCCCCTATATTGGCATAATATTCTTAACCTTCGTTTTTTATTGAAATTTGGTG GTGAAA >HALOBACTERIUM AACGAGCCGCCTGCACGCACCTTTACTCGGTCCCGAGTACTGGCGTTCGGG GCGGCTGTCCGTACGGAAACGCATTTACAGGCACACGCAGTGGCTCCGATACA Sequence type Upstream sequences Paste your phylogenetic tree (Optional if all your species are in the default tree) ( ( ARCHAEOGLOBUS:0.28068, METHANOCOCCUS:0.22056) :0.02234, ( THERMOPLASMA:0.29515, PYROCOCCUS:0.27485) :0.00597, HALOBACTERIUM:0.32480); Motif size 7 Maximum number of mutations 4 Maximum mutations per branch 1 Subregion change cost 1 Subregions size 100 Allow regulatory element losses? No (ignore next two options) Press to submit information, or to clear fields. -------------------------------------------------------------------------------- http://abstract.cs.washington.edu/~blanchem/FootPrinterWeb/__webquery__.fasta0.7711191343582580.403699182710749.main.html# gives 1 motif. Dialign does find this motif. http://www.genomatix.de/cgi-bin/dialign/dialign.pl?SHOW=user_23_3.seq_31378.html&TASK=dialign Adjusting the Subregion options does not change the results. Allowing regulatory element loss and increasing size, number of mutations, degrades the results.