ProteinID		Organism	                    Length of noncoding sequence

16331117		Synechocystis		 	    181

15618801 		Chlamydolpha pneumoniae		    1077

15645855 		Helicobacter Pylori	            1107

15673706		Lactococcus			    266
			


I followed almost the same procedure outlined in the hand
out. Selecting the COGID was not very frustrating. I had to try around
5 COGIDs before getting to the above set of proteins.


The protein sequence (Amino Acid sequences) for above proteins are 

>sll0362
MTTTPPVLSGPEIRQQFLNFFADRQHQILPSASLVPEDPTVLLTIAGMLPFKPIFLGQKSAEFPRATTSQ
KCIRTNDIENVGRTARHHTFFEMLGNFSFGDYFKSQAIAWAWELSTQVFKLPAERLVVSVFEEDDEAFAI
WRDEIGIPAHRIQRMGADDNFWVSGPTGPCGPCSEIYYDFHPELGDEKLDLEDDSRFIEFYNLVFMQYNR
DNAGNLTPLEKKNIDTGMGLERMAQILQKVPNNYETDLIFPIIQTAANIAGIDYAQANEKTKVSLKVIGD
HVRSVVHMIADGISASNLGRGYVLRRLIRRVVRHGRLLGINGEFTTKVAATAVQLAQPVYPNVLERQSLI
EQELQREEAAFLKTLERGEKLLADLMADGVTEIAGADAFTLYDTFGFPLELTQEIAEEQGITVDVEGFEK
AMQEQQERSKAAHETIDLTVQESLDKLANHIHPTEFLGYTDLQSSAIVKAVLVGGELVDQAVAGQTVQIV
LDQTPFYGESGGQIGDKGFLNGDNLLIRIEDVKRESGIFIHFGRVERGTVQIGTTITATIDRACRRRAQA
NHTATHLLQSALKRVVDEGISQAGSLVDFNRLRFDFNSPRAVTMEELQQIEDLINQWIAEAHQTEVAVMP
IADAKAKGAIAMFGEKYGAEVRVIDVPGVSLELCGGTHVANTAEIGLFKIVAETGIAAGVRRIEAVAGPS
VLDYLNVREAVVKELGDRLKAKPEEIPDRVHQLQQELKASQKQLEALKQELALQKSEQLLTQAQTVGEFK
ILVADLGTVDGESLKTAAERLQQKLGESAVVLASIPEEGKVSLVAAFSPQLVKTKQLKAGQFIGAIAKIC
GGGGGGRPNLAQAGGRDASKLPEALATAKQTLLAELG

>CPn0892
MLSNTIRSNFLKFYANRHHTILPSSPVFPHNDPSILFTNAGMNQFKDIFLNKEKVSYSRATTSQKCIRAG
GKHNDLDNVGHTSRHLTFFEMLGNFSFGDYFKAEAIAFAWEVSLSVFNFNPEGIYATVHEKDDEAFALWE
AYLPTDRIFRLTDKDNFWSMANTGPCGYCSELLFDRGPSFGNASSPLDDTDGERFLEYWNLVFMEFNRTS
EGSLLALPNKHVDTGAGLERLVSLIAGTHTVFEADVLRELIAKTEQLSGKVYHPDDSGAAFRVIADHVRS
LSFAIADGLLPGNTERGYVLRKILRRSVNYGRRLGFRNPFLAEIVPSLADAMGEAYPELKNSLSQIQKVL
TLEEESFFKTLDRGGNLLQQVLKSSSSSSCISGEDAFKLKDTYGMPIDEISLLAKDYDYSVDMDTFHKLE
QEAKERSRKNVVQSQGTSESIYNELHLTSEFIGYDHLSCDTFIEAIISKDHIVSSLQEKQEGAIVLKVSP
FYAEKGGQVGDSGEIFCSEGTFIVTHTTSPKAGLIVHHGRISQGSLTVEAAVTAQVNRYRRKRIANNHTA
CHLLHKALEITLGDHIRQAGSYVDDTKIRLDFTHPQAISPEDLLCIETLVNESIRENEPVDIREALYSDV
MNSSEIKQFFGDKYSDVVRVVSAGHSHELCGGTHAEATGDIGFFRITKEHAVAMGIRRIEAVTGEKAEAT
VHQQSEVLEEIATLLQVPRDQIVSRLTATLDERKQQDKRLNELENSLIQTKLDKLIHNCHQRQGITCLVH
HLAEHENHRLQQYAQCLHQRIPEKLISLWTTEKNGKYIVLSRVSDDLITQGVHAQDLLKAVLTPCGGRWG
GKDQSAQGSAPALPATEVLNETLWQWISTQLI


>HP1241
MDIRNEFLQFFQNKGHAVYPSMPLVPNDATLLFTNAGMVQFKDIFTGIVPRPSIPRAASSQLCMRAGGKH
NDLENVGYTARHHTLFEMLGNFSFGDYFKEEAILFAWEFVTKNLGFKPKDLYISVHEKDDEAVKLWEKFV
PVDRIKKMGDKDNFWQMGDSGPCGPCSEIYIDQGEKHFKGSEDYFGGEGDRFLEIWNLVFMQYERSNDGV
LSPLPKPSIDTGMGLERVQALLEHKLNNFDSSLFAPLMEEISELTSLDYASEFQPSFRVVADHARAVAFL
LAQGVHFNKEGRGYVLRRILRRALRHGYLMGLKEAFLYKVVGVVCEQFANTHAYLKESKEMVVKECFEEE
EHFLETLESGMELFNLSLKHLNENKIFDGKIAFKLYDTFGFPLDLTNDMLRSHGACADMQGFELCMQEQV
KRSKASWKGKQNNADFSAILNAYAPNVFVGYETTECSAKVLGFFDSDFKEITDANPNQEVWVLLEKTPFY
AEGGGAIGDRGALFKDNGEVAIVLDTKNFFGLNFSLLEIKKALKKGDQVIAQVSDERFEIAKHHSATHLL
QSALREVLGSHVSQAGSLVESKRLRFDFSHAKALNDEELEKVEDLVNAQIFKHLNSQVEHMPLNQAKDKG
ALALFSEKYAENVRVVSFKEASIELCGGIHVENTGLIGGFRIVKESGVSSGVRRIEAVCGKAFYQLAKEE
NKELKNAKTLLKNNDVIAGINKLKESVKNSQKAPVSMDLPVEKIHGVNLVVGVVEQGDIKEMIDRLKSKH
ERLLAMVFKKENERITLACGVKNAPIKANVWANEVAQILGGKGGGRGDFASAGGKDIENLQAALNLAKNT
ALKALEG

>L0343
MKTMTSAEVRQMFLDFFKSKGHTVEPSQSLVPVNDPTLLWINSGVATLKKYFDGSVVPENPRLTNAQKAI
RTNDIENVGKTARHHTMFEMLGNFSIGDYFRKEAIAFAWELLTSSEWFEFPAEKLYITYYPADKDTYNRW
VEVGVDPTHLVPIEDNFWEIGAGPSGPDTEIFFDRGEVYDPEHVGLKLLAEDIENDRYIEIWNIVLSQFN
ADPSIPRSEYPELPQKNIDTGMGLERMVCIIQGGKTNFDTDLFLPIIREIEKLSGKTYSPDSENMSFKVI
ADHIRSLSFAIGDGALPGNEGRGYVLRRLLRRAVMHGKKLGIQGKFLASLVPTVGKIMQSYYPEVLEKED
FIMQIIDREEETFNRTIDAGQKLIDELLLNLKSEGKDRLEGADIFRLYDTYGFPVELTEELAEDEGFKID
HEGFKVAMKAQQERARAAVVKGGSMGAQNETLSSIEVESEFLYEDKTTQGKLLVSIQDDEIVDEVSGKAQ
LVFDVTPFYAEMGGQVADHGVIKDAEGQVVANVLDVQHAPHGQNLHSVETLSPLKVGESYTLEIDKERRA
AVVKNHTATHLLHAALHNIVGNHALQAGSLNEVEFLRFDFTHFAQVTKEELAEIERQVNEVIWQSLKVET
VETDIATAKEMGAMALFGEKYGKNVRVVKIGDYSIELCGGTHTQTTSEIGLFKIVKEEGIGSGVRRIIAV
TGQKAYEAFKDAENTLNEVATMVKAPQTSQVLAKVTSLQDELKTAQKENDALAGKLAASQSDEIFKNVQT
AGSLNFIASEVTVPDANGLRNLADIWKQKELSDVLVLVAKIGEKVSLLVASKSSDVKAGNLVKELAPFVD
GRGGGKPDMAMAGGSKAAGIPELLAAVAEKLA


CLUSTAL W (1.81) Multiple Sequence Alignments

Sequence format is Pearson
Sequence 1: sll0362         877 aa
Sequence 2: CPn0892         872 aa
Sequence 3: HP1241          847 aa
Sequence 4: L0343           872 aa
Start of Pairwise alignments
Aligning...


Sequences (1:2) Aligned. Score: 33.3716
Sequences (1:3) Aligned. Score: 34.2385
Sequences (1:4) Aligned. Score: 36.2385
Sequences (2:2) Aligned. Score: 100
Sequences (2:3) Aligned. Score: 32.4675
Sequences (2:4) Aligned. Score: 30.5046
Sequences (3:2) Aligned. Score: 32.1133
Sequences (3:3) Aligned. Score: 100
Sequences (3:4) Aligned. Score: 30.1063
Sequences (4:2) Aligned. Score: 30.5046
Sequences (4:3) Aligned. Score: 30.1063
Sequences (4:4) Aligned. Score: 100

Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Sequences:   2      Score:12780
Group 2: Sequences:   3      Score:9601
Group 3: Sequences:   4      Score:9700
Alignment Score 10381

-------------------------------------------------------------------------------- Clustal output 
CLUSTAL W (1.81) multiple sequence alignment


sll0362         MTTTPPVLSGPEIRQQFLNFFADRQHQILPSASLVPE-DPTVLLTIAGMLPFKPIFLGQK
L0343           MKT----MTSAEVRQMFLDFFKSKGHTVEPSQSLVPVNDPTLLWINSGVATLKKYFDGSV
HP1241          ----------MDIRNEFLQFFQNKGHAVYPSMPLVPN-DATLLFTNAGMVQFKDIFTGIV
CPn0892         -------MLSNTIRSNFLKFYANRHHTILPSSPVFPHNDPSILFTNAGMNQFKDIFLNKE
                            :*. **.*: .: * : ** .:.*  *.::*   :*:  :*  * .  

sll0362         S-AEFPRATTSQKCIRT----NDIENVGRTARHHTFFEMLGNFSFGDYFKSQAIAWAWEL
L0343           V-PENPRLTNAQKAIRT----NDIENVGKTARHHTMFEMLGNFSIGDYFRKEAIAFAWEL
HP1241          PRPSIPRAASSQLCMRAGGKHNDLENVGYTARHHTLFEMLGNFSFGDYFKEEAILFAWEF
CPn0892         K-VSYSRATTSQKCIRAGGKHNDLDNVGHTSRHLTFFEMLGNFSFGDYFKAEAIAFAWEV
                   . .* :.:* .:*:    **::*** *:** *:********:****: :** :***.

sll0362         ST--QVFKLPAERLVVSVFEEDDEAFAIWRDEIGIPAHRIQRMGADDNFWVSGPTGPCGP
L0343           LTSSEWFEFPAEKLYITYYPADKDTYNRWVEVGVDPTHLVP---IEDNFWEIG-AGPSGP
HP1241          VT--KNLGFKPKDLYISVHEKDDEAVKLWEKF--VPVDRIKKMGDKDNFWQMGDSGPCGP
CPn0892         SLS--VFNFNPEGIYATVHEKDDEAFALWEAY--LPTDRIFRLTDKDNFWSMANTGPCGY
                      : : .: :  : .  *.::   *      *.. :     .****  . :**.* 

sll0362         CSEIYYD----FHPELGDEKLDLE--DDSRFIEFYNLVFMQYNRD---NAGNLTPLEKKN
L0343           DTEIFFDRGEVYDPEHVGLKLLAEDIENDRYIEIWNIVLSQFNADPSIPRSEYPELPQKN
HP1241          CSEIYID----QGEKHFKGSEDYFGGEGDRFLEIWNLVFMQYERS---NDGVLSPLPKPS
CPn0892         CSELLFD----RGPSFGNASSPLDDTDGERFLEYWNLVFMEFNRT---SEGSLLALPNKH
                 :*:  *       .    .      :..*::* :*:*: :::       .    * :  

sll0362         IDTGMGLERMAQILQKVPNNYETDLIFPIIQTAANIAGIDYAQANEKTKVSLKVIGDHVR
L0343           IDTGMGLERMVCIIQGGKTNFDTDLFLPIIREIEKLSGKTYSPDSEN--MSFKVIADHIR
HP1241          IDTGMGLERVQALLEHKLNNFDSSLFAPLMEEISELTSLDYASEFQP---SFRVVADHAR
CPn0892         VDTGAGLERLVSLIAGTHTVFEADVLRELIAKTEQLSGKVYHPDDSG--AAFRVIADHVR
                :*** ****:  ::    . :::.::  ::    :::.  *    .    :::*:.** *

sll0362         SVVHMIADGISASNLGRGYVLRRLIRRVVRHGRLLGINGEFTTKVAATAVQLAQPVYPNV
L0343           SLSFAIGDGALPGNEGRGYVLRRLLRRAVMHGKKLGIQGKFLASLVPTVGKIMQSYYPEV
HP1241          AVAFLLAQGVHFNKEGRGYVLRRILRRALRHGYLMGLKEAFLYKVVGVVCEQFANTHAYL
CPn0892         SLSFAIADGLLPGNTERGYVLRKILRRSVNYGRRLGFRNPFLAEIVPSLADAMGEAYPEL
                :: . :.:*   .:  ******:::** : :*  :*:.  *  .:.    .     :. :

sll0362         LERQSLIEQELQREEAAFLKTLERGEKLLADLMA----DGVTEIAGADAFTLYDTFGFPL
L0343           LEKEDFIMQIIDREEETFNRTIDAGQKLIDELLLNLKSEGKDRLEGADIFRLYDTYGFPV
HP1241          KESKEMVVKECFEEEEHFLETLESGMELFNLSLKHLN--ENKIFDGKIAFKLYDTFGFPL
CPn0892         KNSLSQIQKVLTLEEESFFKTLDRGGNLLQQVLKSSS--SSSCISGEDAFKLKDTYGMPI
                 :  . : :    **  * .*:: * :*:   :          : *   * * **:*:*:

sll0362         ELTQEIAEEQGITVDVEGFEKAMQEQQERSKAAHETIDLTVQESLDKLANHIHPTEFLGY
L0343           ELTEELAEDEGFKIDHEGFKVAMKAQQERARAAVVKGG-SMGAQNETLSSIEVESEFL-Y
HP1241          DLTNDMLRSHGACADMQGFELCMQEQVKRSKASWKGKQ--NNADFSAILNAYAPNVFVGY
CPn0892         DEISLLAKDYDYSVDMDTFHKLEQEAKERSRKNVVQSQ---GTSESIYNELHLTSEFIGY
                :  . : .. .   * : *.   :   :*::            . .   .    . *: *

sll0362         TDLQSSAIVKAVLV-GGELVDQAVAGQTVQIVLDQTPFYGESGGQIGDKGFLNGDN--LL
L0343           EDKTTQGKLLVSIQ-DDEIVDEVSG--KAQLVFDVTPFYAEMGGQVADHGVIKDAEGQVV
HP1241          ETTECSAKVLGFFDSDFKEITDANPNQEVWVLLEKTPFYAEGGGAIGDRGALFKDNG-EV
CPn0892         DHLSCDTFIEAIIS-KDHIVSSLQEKQEGAIVLKVSPFYAEKGGQVGDSGEIFCSEG-TF
                     .  :   :    . : .        :::. :***.* ** :.* * :   :   .

sll0362         IRIEDVKRESGIFIHFGRVERGTVQIGTTITATIDRACRRRAQANHTATHLLQSALKRVV
L0343           ANVLDVQHAPHGQNLHSVETLSPLKVGESYTLEIDKERRAAVVKNHTATHLLHAALHNIV
HP1241          AIVLDTKN-FFGLNFSLLEIKKALKKGDQVIAQVSDE-RFEIAKHHSATHLLQSALREVL
CPn0892         IVTHTTSPKAGLIVHHGRISQGSLTVEAAVTAQVNRYRRKRIANNHTACHLLHKALEITL
                     ..               .:         :.   *     :*:* ***: **.  :

sll0362         DEGISQAGSLVDFNRLRFDFNSPRAVTMEELQQIEDLINQWIAEAHQTEVAVMPIADAKA
L0343           GNHALQAGSLNEVEFLRFDFTHFAQVTKEELAEIERQVNEVIWQSLKVETVETDIATAKE
HP1241          GSHVSQAGSLVESKRLRFDFSHAKALNDEELEKVEDLVNAQIFKHLNSQVEHMPLNQAKD
CPn0892         GDHIRQAGSYVDDTKIRLDFTHPQAISPEDLLCIETLVNESIRENEPVDIREALYSDVMN
                ..   ****  :   :*:**.    :. *:*  :*  :*  * :    :        .  

sll0362         KG-AIAMFGEKYGAEVRVIDVPGVSLELCGGTHVANTAEIGLFKIVAETGIAAGVRRIEA
L0343           MG-AMALFGEKYGKNVRVVKIGDYSIELCGGTHTQTTSEIGLFKIVKEEGIGSGVRRIIA
HP1241          KG-ALALFSEKYAENVRVVSFKEASIELCGGIHVENTGLIGGFRIVKESGVSSGVRRIEA
CPn0892         SSEIKQFFGDKYSDVVRVVSAG-HSHELCGGTHAEATGDIGFFRITKEHAVAMGIRRIEA
                 .    :*.:**.  ***:.    * ***** *.  *. ** *:*. * .:. *:*** *

sll0362         VAGPSVLDYLNVREAVVKELGDRLKAKPE-EIPDRVHQLQQELKASQKQLEALKQELALQ
L0343           VTGQKAYEAFKDAENTLNEVATMVKAPQTSQVLAKVTSLQDELKTAQKENDALAGKLAAS
HP1241          VCGKAFYQLAKEENKELKNAKTLLK---NNDVIAGINKLKESVKNSQKAPVSMDLPVEKI
CPn0892         VTGEKAEATVHQQSEVLEEIATLLQVPRD-QIVSRLTATLDERKQQDKRLNELENSLIQT
                * *       :  .  :::    ::     ::   :    :. *  :*    :   :   

sll0362         KSEQLLTQAQTVGEFKILVADLGTVDGESLKTAAERLQQK-LGESAVVLASIPEEGKVSL
L0343           QSDEIFKNVQTAGSLNFIASEVTVPDANGLRNLADIWKQKELSDVLVLVAKIGE--KVSL
HP1241          HG---------------VNLVVGVVEQGDIKEMIDRLKSKHERLLAMVFKKENE--RITL
CPn0892         KLDKLIHNCHQRQGITCLVHHLAEHENHRLQQYAQCLHQRIPEKLISLWTTEKNGKYIVL
                :                :   :   :   ::   :  :.:       :  .  :   : *

sll0362         VAAFSPQLVKTKQLKAGQFIGAIAKICGGGGGGRPNLAQAGGRDASKLPEALATAKQTLL
L0343           LVAS-----KSSDVKAGNLVKELAPFVDGRGGGKPDMAMAGGSKAAGIPELLAAVAEKLA
HP1241          ACGV-----KNAPIKANVWANEVAQILGGKGGGRGDFASAGGKDIENLQAALNLAKNTAL
CPn0892         SRVS--DDLITQGVHAQDLLKAVLTPCGGRWGGKDQSAQGSAPALPATEVLNETLWQWIS
                          .  ::*      :    .*  **: : * ...              :   

sll0362         AELG-
L0343           -----
HP1241          KALEG
CPn0892         TQLI-
                     
-------------------------------------------------------------------------------- Tree construction 




 CLUSTAL W (1.81) Multiple Sequence Alignments



Sequence format is Clustal
Sequence 1: sll0362         905 aa
Sequence 2: L0343           905 aa
Sequence 3: HP1241          905 aa
Sequence 4: CPn0892         905 aa
Phylogenetic tree file created:   [clustal.ph]


     +----------------------------------------------------1:sll0362
  +--6  
  |  +---------------------------------------------------------2:L0343
--5  
  |  +-----------------------------------------------------------3:HP1241
  +--7  
     +----------------------------------------------------------------4:CPn0892

Midpoint-rooted tree:
         +--------------------------------------------------------1:sll0362
     +---6  
  +--7   +-------------------------------------------------------------2:L0343
  |  |  
--5  +----------------------------------------------------------------3:HP1241
  |  
  +-------------------------------------------------------------------4:CPn0892


 Remember, this is an unrooted tree! 
-------------------------------------------------------------------------------- Bootstrapping tree 




 CLUSTAL W (1.81) Multiple Sequence Alignments



Sequence format is Clustal
Sequence 1: sll0362         905 aa
Sequence 2: L0343           905 aa
Sequence 3: HP1241          905 aa
Sequence 4: CPn0892         905 aa
Phylogenetic tree file created:   [clustal.ph]

No bootstrap generated 
This is probably caused by too few input sequences. CLUSTAL needs more than three sequences in order to be able to calculate a phylogenetic tree. 


--------------------------------------------------------------------------------

RunClustalW--V1.4--21-Oct-2002/JackL