ProteinID Organism Length of noncoding sequence 16331117 Synechocystis 181 15618801 Chlamydolpha pneumoniae 1077 15645855 Helicobacter Pylori 1107 15673706 Lactococcus 266 I followed almost the same procedure outlined in the hand out. Selecting the COGID was not very frustrating. I had to try around 5 COGIDs before getting to the above set of proteins. The protein sequence (Amino Acid sequences) for above proteins are >sll0362 MTTTPPVLSGPEIRQQFLNFFADRQHQILPSASLVPEDPTVLLTIAGMLPFKPIFLGQKSAEFPRATTSQ KCIRTNDIENVGRTARHHTFFEMLGNFSFGDYFKSQAIAWAWELSTQVFKLPAERLVVSVFEEDDEAFAI WRDEIGIPAHRIQRMGADDNFWVSGPTGPCGPCSEIYYDFHPELGDEKLDLEDDSRFIEFYNLVFMQYNR DNAGNLTPLEKKNIDTGMGLERMAQILQKVPNNYETDLIFPIIQTAANIAGIDYAQANEKTKVSLKVIGD HVRSVVHMIADGISASNLGRGYVLRRLIRRVVRHGRLLGINGEFTTKVAATAVQLAQPVYPNVLERQSLI EQELQREEAAFLKTLERGEKLLADLMADGVTEIAGADAFTLYDTFGFPLELTQEIAEEQGITVDVEGFEK AMQEQQERSKAAHETIDLTVQESLDKLANHIHPTEFLGYTDLQSSAIVKAVLVGGELVDQAVAGQTVQIV LDQTPFYGESGGQIGDKGFLNGDNLLIRIEDVKRESGIFIHFGRVERGTVQIGTTITATIDRACRRRAQA NHTATHLLQSALKRVVDEGISQAGSLVDFNRLRFDFNSPRAVTMEELQQIEDLINQWIAEAHQTEVAVMP IADAKAKGAIAMFGEKYGAEVRVIDVPGVSLELCGGTHVANTAEIGLFKIVAETGIAAGVRRIEAVAGPS VLDYLNVREAVVKELGDRLKAKPEEIPDRVHQLQQELKASQKQLEALKQELALQKSEQLLTQAQTVGEFK ILVADLGTVDGESLKTAAERLQQKLGESAVVLASIPEEGKVSLVAAFSPQLVKTKQLKAGQFIGAIAKIC GGGGGGRPNLAQAGGRDASKLPEALATAKQTLLAELG >CPn0892 MLSNTIRSNFLKFYANRHHTILPSSPVFPHNDPSILFTNAGMNQFKDIFLNKEKVSYSRATTSQKCIRAG GKHNDLDNVGHTSRHLTFFEMLGNFSFGDYFKAEAIAFAWEVSLSVFNFNPEGIYATVHEKDDEAFALWE AYLPTDRIFRLTDKDNFWSMANTGPCGYCSELLFDRGPSFGNASSPLDDTDGERFLEYWNLVFMEFNRTS EGSLLALPNKHVDTGAGLERLVSLIAGTHTVFEADVLRELIAKTEQLSGKVYHPDDSGAAFRVIADHVRS LSFAIADGLLPGNTERGYVLRKILRRSVNYGRRLGFRNPFLAEIVPSLADAMGEAYPELKNSLSQIQKVL TLEEESFFKTLDRGGNLLQQVLKSSSSSSCISGEDAFKLKDTYGMPIDEISLLAKDYDYSVDMDTFHKLE QEAKERSRKNVVQSQGTSESIYNELHLTSEFIGYDHLSCDTFIEAIISKDHIVSSLQEKQEGAIVLKVSP FYAEKGGQVGDSGEIFCSEGTFIVTHTTSPKAGLIVHHGRISQGSLTVEAAVTAQVNRYRRKRIANNHTA CHLLHKALEITLGDHIRQAGSYVDDTKIRLDFTHPQAISPEDLLCIETLVNESIRENEPVDIREALYSDV MNSSEIKQFFGDKYSDVVRVVSAGHSHELCGGTHAEATGDIGFFRITKEHAVAMGIRRIEAVTGEKAEAT VHQQSEVLEEIATLLQVPRDQIVSRLTATLDERKQQDKRLNELENSLIQTKLDKLIHNCHQRQGITCLVH HLAEHENHRLQQYAQCLHQRIPEKLISLWTTEKNGKYIVLSRVSDDLITQGVHAQDLLKAVLTPCGGRWG GKDQSAQGSAPALPATEVLNETLWQWISTQLI >HP1241 MDIRNEFLQFFQNKGHAVYPSMPLVPNDATLLFTNAGMVQFKDIFTGIVPRPSIPRAASSQLCMRAGGKH NDLENVGYTARHHTLFEMLGNFSFGDYFKEEAILFAWEFVTKNLGFKPKDLYISVHEKDDEAVKLWEKFV PVDRIKKMGDKDNFWQMGDSGPCGPCSEIYIDQGEKHFKGSEDYFGGEGDRFLEIWNLVFMQYERSNDGV LSPLPKPSIDTGMGLERVQALLEHKLNNFDSSLFAPLMEEISELTSLDYASEFQPSFRVVADHARAVAFL LAQGVHFNKEGRGYVLRRILRRALRHGYLMGLKEAFLYKVVGVVCEQFANTHAYLKESKEMVVKECFEEE EHFLETLESGMELFNLSLKHLNENKIFDGKIAFKLYDTFGFPLDLTNDMLRSHGACADMQGFELCMQEQV KRSKASWKGKQNNADFSAILNAYAPNVFVGYETTECSAKVLGFFDSDFKEITDANPNQEVWVLLEKTPFY AEGGGAIGDRGALFKDNGEVAIVLDTKNFFGLNFSLLEIKKALKKGDQVIAQVSDERFEIAKHHSATHLL QSALREVLGSHVSQAGSLVESKRLRFDFSHAKALNDEELEKVEDLVNAQIFKHLNSQVEHMPLNQAKDKG ALALFSEKYAENVRVVSFKEASIELCGGIHVENTGLIGGFRIVKESGVSSGVRRIEAVCGKAFYQLAKEE NKELKNAKTLLKNNDVIAGINKLKESVKNSQKAPVSMDLPVEKIHGVNLVVGVVEQGDIKEMIDRLKSKH ERLLAMVFKKENERITLACGVKNAPIKANVWANEVAQILGGKGGGRGDFASAGGKDIENLQAALNLAKNT ALKALEG >L0343 MKTMTSAEVRQMFLDFFKSKGHTVEPSQSLVPVNDPTLLWINSGVATLKKYFDGSVVPENPRLTNAQKAI RTNDIENVGKTARHHTMFEMLGNFSIGDYFRKEAIAFAWELLTSSEWFEFPAEKLYITYYPADKDTYNRW VEVGVDPTHLVPIEDNFWEIGAGPSGPDTEIFFDRGEVYDPEHVGLKLLAEDIENDRYIEIWNIVLSQFN ADPSIPRSEYPELPQKNIDTGMGLERMVCIIQGGKTNFDTDLFLPIIREIEKLSGKTYSPDSENMSFKVI ADHIRSLSFAIGDGALPGNEGRGYVLRRLLRRAVMHGKKLGIQGKFLASLVPTVGKIMQSYYPEVLEKED FIMQIIDREEETFNRTIDAGQKLIDELLLNLKSEGKDRLEGADIFRLYDTYGFPVELTEELAEDEGFKID HEGFKVAMKAQQERARAAVVKGGSMGAQNETLSSIEVESEFLYEDKTTQGKLLVSIQDDEIVDEVSGKAQ LVFDVTPFYAEMGGQVADHGVIKDAEGQVVANVLDVQHAPHGQNLHSVETLSPLKVGESYTLEIDKERRA AVVKNHTATHLLHAALHNIVGNHALQAGSLNEVEFLRFDFTHFAQVTKEELAEIERQVNEVIWQSLKVET VETDIATAKEMGAMALFGEKYGKNVRVVKIGDYSIELCGGTHTQTTSEIGLFKIVKEEGIGSGVRRIIAV TGQKAYEAFKDAENTLNEVATMVKAPQTSQVLAKVTSLQDELKTAQKENDALAGKLAASQSDEIFKNVQT AGSLNFIASEVTVPDANGLRNLADIWKQKELSDVLVLVAKIGEKVSLLVASKSSDVKAGNLVKELAPFVD GRGGGKPDMAMAGGSKAAGIPELLAAVAEKLA CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: sll0362 877 aa Sequence 2: CPn0892 872 aa Sequence 3: HP1241 847 aa Sequence 4: L0343 872 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 33.3716 Sequences (1:3) Aligned. Score: 34.2385 Sequences (1:4) Aligned. Score: 36.2385 Sequences (2:2) Aligned. Score: 100 Sequences (2:3) Aligned. Score: 32.4675 Sequences (2:4) Aligned. Score: 30.5046 Sequences (3:2) Aligned. Score: 32.1133 Sequences (3:3) Aligned. Score: 100 Sequences (3:4) Aligned. Score: 30.1063 Sequences (4:2) Aligned. Score: 30.5046 Sequences (4:3) Aligned. Score: 30.1063 Sequences (4:4) Aligned. Score: 100 Start of Multiple Alignment There are 3 groups Aligning... Group 1: Sequences: 2 Score:12780 Group 2: Sequences: 3 Score:9601 Group 3: Sequences: 4 Score:9700 Alignment Score 10381 -------------------------------------------------------------------------------- Clustal output CLUSTAL W (1.81) multiple sequence alignment sll0362 MTTTPPVLSGPEIRQQFLNFFADRQHQILPSASLVPE-DPTVLLTIAGMLPFKPIFLGQK L0343 MKT----MTSAEVRQMFLDFFKSKGHTVEPSQSLVPVNDPTLLWINSGVATLKKYFDGSV HP1241 ----------MDIRNEFLQFFQNKGHAVYPSMPLVPN-DATLLFTNAGMVQFKDIFTGIV CPn0892 -------MLSNTIRSNFLKFYANRHHTILPSSPVFPHNDPSILFTNAGMNQFKDIFLNKE :*. **.*: .: * : ** .:.* *.::* :*: :* * . sll0362 S-AEFPRATTSQKCIRT----NDIENVGRTARHHTFFEMLGNFSFGDYFKSQAIAWAWEL L0343 V-PENPRLTNAQKAIRT----NDIENVGKTARHHTMFEMLGNFSIGDYFRKEAIAFAWEL HP1241 PRPSIPRAASSQLCMRAGGKHNDLENVGYTARHHTLFEMLGNFSFGDYFKEEAILFAWEF CPn0892 K-VSYSRATTSQKCIRAGGKHNDLDNVGHTSRHLTFFEMLGNFSFGDYFKAEAIAFAWEV . .* :.:* .:*: **::*** *:** *:********:****: :** :***. sll0362 ST--QVFKLPAERLVVSVFEEDDEAFAIWRDEIGIPAHRIQRMGADDNFWVSGPTGPCGP L0343 LTSSEWFEFPAEKLYITYYPADKDTYNRWVEVGVDPTHLVP---IEDNFWEIG-AGPSGP HP1241 VT--KNLGFKPKDLYISVHEKDDEAVKLWEKF--VPVDRIKKMGDKDNFWQMGDSGPCGP CPn0892 SLS--VFNFNPEGIYATVHEKDDEAFALWEAY--LPTDRIFRLTDKDNFWSMANTGPCGY : : .: : : . *.:: * *.. : .**** . :**.* sll0362 CSEIYYD----FHPELGDEKLDLE--DDSRFIEFYNLVFMQYNRD---NAGNLTPLEKKN L0343 DTEIFFDRGEVYDPEHVGLKLLAEDIENDRYIEIWNIVLSQFNADPSIPRSEYPELPQKN HP1241 CSEIYID----QGEKHFKGSEDYFGGEGDRFLEIWNLVFMQYERS---NDGVLSPLPKPS CPn0892 CSELLFD----RGPSFGNASSPLDDTDGERFLEYWNLVFMEFNRT---SEGSLLALPNKH :*: * . . :..*::* :*:*: ::: . * : sll0362 IDTGMGLERMAQILQKVPNNYETDLIFPIIQTAANIAGIDYAQANEKTKVSLKVIGDHVR L0343 IDTGMGLERMVCIIQGGKTNFDTDLFLPIIREIEKLSGKTYSPDSEN--MSFKVIADHIR HP1241 IDTGMGLERVQALLEHKLNNFDSSLFAPLMEEISELTSLDYASEFQP---SFRVVADHAR CPn0892 VDTGAGLERLVSLIAGTHTVFEADVLRELIAKTEQLSGKVYHPDDSG--AAFRVIADHVR :*** ****: :: . :::.:: :: :::. * . :::*:.** * sll0362 SVVHMIADGISASNLGRGYVLRRLIRRVVRHGRLLGINGEFTTKVAATAVQLAQPVYPNV L0343 SLSFAIGDGALPGNEGRGYVLRRLLRRAVMHGKKLGIQGKFLASLVPTVGKIMQSYYPEV HP1241 AVAFLLAQGVHFNKEGRGYVLRRILRRALRHGYLMGLKEAFLYKVVGVVCEQFANTHAYL CPn0892 SLSFAIADGLLPGNTERGYVLRKILRRSVNYGRRLGFRNPFLAEIVPSLADAMGEAYPEL :: . :.:* .: ******:::** : :* :*:. * .:. . :. : sll0362 LERQSLIEQELQREEAAFLKTLERGEKLLADLMA----DGVTEIAGADAFTLYDTFGFPL L0343 LEKEDFIMQIIDREEETFNRTIDAGQKLIDELLLNLKSEGKDRLEGADIFRLYDTYGFPV HP1241 KESKEMVVKECFEEEEHFLETLESGMELFNLSLKHLN--ENKIFDGKIAFKLYDTFGFPL CPn0892 KNSLSQIQKVLTLEEESFFKTLDRGGNLLQQVLKSSS--SSSCISGEDAFKLKDTYGMPI : . : : ** * .*:: * :*: : : * * * **:*:*: sll0362 ELTQEIAEEQGITVDVEGFEKAMQEQQERSKAAHETIDLTVQESLDKLANHIHPTEFLGY L0343 ELTEELAEDEGFKIDHEGFKVAMKAQQERARAAVVKGG-SMGAQNETLSSIEVESEFL-Y HP1241 DLTNDMLRSHGACADMQGFELCMQEQVKRSKASWKGKQ--NNADFSAILNAYAPNVFVGY CPn0892 DEISLLAKDYDYSVDMDTFHKLEQEAKERSRKNVVQSQ---GTSESIYNELHLTSEFIGY : . : .. . * : *. : :*:: . . . . *: * sll0362 TDLQSSAIVKAVLV-GGELVDQAVAGQTVQIVLDQTPFYGESGGQIGDKGFLNGDN--LL L0343 EDKTTQGKLLVSIQ-DDEIVDEVSG--KAQLVFDVTPFYAEMGGQVADHGVIKDAEGQVV HP1241 ETTECSAKVLGFFDSDFKEITDANPNQEVWVLLEKTPFYAEGGGAIGDRGALFKDNG-EV CPn0892 DHLSCDTFIEAIIS-KDHIVSSLQEKQEGAIVLKVSPFYAEKGGQVGDSGEIFCSEG-TF . : : . : . :::. :***.* ** :.* * : : . sll0362 IRIEDVKRESGIFIHFGRVERGTVQIGTTITATIDRACRRRAQANHTATHLLQSALKRVV L0343 ANVLDVQHAPHGQNLHSVETLSPLKVGESYTLEIDKERRAAVVKNHTATHLLHAALHNIV HP1241 AIVLDTKN-FFGLNFSLLEIKKALKKGDQVIAQVSDE-RFEIAKHHSATHLLQSALREVL CPn0892 IVTHTTSPKAGLIVHHGRISQGSLTVEAAVTAQVNRYRRKRIANNHTACHLLHKALEITL .. .: :. * :*:* ***: **. : sll0362 DEGISQAGSLVDFNRLRFDFNSPRAVTMEELQQIEDLINQWIAEAHQTEVAVMPIADAKA L0343 GNHALQAGSLNEVEFLRFDFTHFAQVTKEELAEIERQVNEVIWQSLKVETVETDIATAKE HP1241 GSHVSQAGSLVESKRLRFDFSHAKALNDEELEKVEDLVNAQIFKHLNSQVEHMPLNQAKD CPn0892 GDHIRQAGSYVDDTKIRLDFTHPQAISPEDLLCIETLVNESIRENEPVDIREALYSDVMN .. **** : :*:**. :. *:* :* :* * : : . sll0362 KG-AIAMFGEKYGAEVRVIDVPGVSLELCGGTHVANTAEIGLFKIVAETGIAAGVRRIEA L0343 MG-AMALFGEKYGKNVRVVKIGDYSIELCGGTHTQTTSEIGLFKIVKEEGIGSGVRRIIA HP1241 KG-ALALFSEKYAENVRVVSFKEASIELCGGIHVENTGLIGGFRIVKESGVSSGVRRIEA CPn0892 SSEIKQFFGDKYSDVVRVVSAG-HSHELCGGTHAEATGDIGFFRITKEHAVAMGIRRIEA . :*.:**. ***:. * ***** *. *. ** *:*. * .:. *:*** * sll0362 VAGPSVLDYLNVREAVVKELGDRLKAKPE-EIPDRVHQLQQELKASQKQLEALKQELALQ L0343 VTGQKAYEAFKDAENTLNEVATMVKAPQTSQVLAKVTSLQDELKTAQKENDALAGKLAAS HP1241 VCGKAFYQLAKEENKELKNAKTLLK---NNDVIAGINKLKESVKNSQKAPVSMDLPVEKI CPn0892 VTGEKAEATVHQQSEVLEEIATLLQVPRD-QIVSRLTATLDERKQQDKRLNELENSLIQT * * : . ::: :: :: : :. * :* : : sll0362 KSEQLLTQAQTVGEFKILVADLGTVDGESLKTAAERLQQK-LGESAVVLASIPEEGKVSL L0343 QSDEIFKNVQTAGSLNFIASEVTVPDANGLRNLADIWKQKELSDVLVLVAKIGE--KVSL HP1241 HG---------------VNLVVGVVEQGDIKEMIDRLKSKHERLLAMVFKKENE--RITL CPn0892 KLDKLIHNCHQRQGITCLVHHLAEHENHRLQQYAQCLHQRIPEKLISLWTTEKNGKYIVL : : : : :: : :.: : . : : * sll0362 VAAFSPQLVKTKQLKAGQFIGAIAKICGGGGGGRPNLAQAGGRDASKLPEALATAKQTLL L0343 LVAS-----KSSDVKAGNLVKELAPFVDGRGGGKPDMAMAGGSKAAGIPELLAAVAEKLA HP1241 ACGV-----KNAPIKANVWANEVAQILGGKGGGRGDFASAGGKDIENLQAALNLAKNTAL CPn0892 SRVS--DDLITQGVHAQDLLKAVLTPCGGRWGGKDQSAQGSAPALPATEVLNETLWQWIS . ::* : .* **: : * ... : sll0362 AELG- L0343 ----- HP1241 KALEG CPn0892 TQLI- -------------------------------------------------------------------------------- Tree construction CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: sll0362 905 aa Sequence 2: L0343 905 aa Sequence 3: HP1241 905 aa Sequence 4: CPn0892 905 aa Phylogenetic tree file created: [clustal.ph] +----------------------------------------------------1:sll0362 +--6 | +---------------------------------------------------------2:L0343 --5 | +-----------------------------------------------------------3:HP1241 +--7 +----------------------------------------------------------------4:CPn0892 Midpoint-rooted tree: +--------------------------------------------------------1:sll0362 +---6 +--7 +-------------------------------------------------------------2:L0343 | | --5 +----------------------------------------------------------------3:HP1241 | +-------------------------------------------------------------------4:CPn0892 Remember, this is an unrooted tree! -------------------------------------------------------------------------------- Bootstrapping tree CLUSTAL W (1.81) Multiple Sequence Alignments Sequence format is Clustal Sequence 1: sll0362 905 aa Sequence 2: L0343 905 aa Sequence 3: HP1241 905 aa Sequence 4: CPn0892 905 aa Phylogenetic tree file created: [clustal.ph] No bootstrap generated This is probably caused by too few input sequences. CLUSTAL needs more than three sequences in order to be able to calculate a phylogenetic tree. -------------------------------------------------------------------------------- RunClustalW--V1.4--21-Oct-2002/JackL