Opsin evolution: annotation tricks: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 205: Line 205:
  <font color="#990099" face="Courier" si e="2">MEL2_helRo                                                                                                                    TPILRTHANVLIINLA-LCDLIFSSLIGFPMTALSCFKRHWIWGD--L</font>
  <font color="#990099" face="Courier" si e="2">MEL2_helRo                                                                                                                    TPILRTHANVLIINLA-LCDLIFSSLIGFPMTALSCFKRHWIWGD--L</font>
                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                
=== Embedded active web pages1 ===
<embedurl>http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html{scrolling=auto}</embedurl>
=== Embedded active web pages2 ===
<embedurl>
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&BLAST_SPEC=TraceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch{width=1000}{height=800}
</embedurl>
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                        
[[Category:Comparative Genomics]]
[[Category:Comparative Genomics]]

Revision as of 11:57, 28 May 2008

Quick Tricks to Annotating Remotely Related Opsins

Managing Hundreds of Opsin Sequences

As the number of opsin reference sequences increases, their management becomes more difficult. For example, aligning all 238 opsins at once is highly desirable but the result will spread out inconveniently over many dozens of jumbo computer screens. It turns out that certain opsin classes are signficantly expanded in length at either or both termini. To the extent these are conserved within that opsin class over long total branch lengths, these extensions are important functionally (but in some unknown way). While that interest is narrowly defined, the offending residues cannot simply be discarded because it is primary data.

Consequently it is helpful to reversibly trim off these ends as needed, indeed trim down to the maximal conserved core. Sequences need to be identical in lengths for certain comparative genomic web tools, ie gapped out to all match in length. That can't be done just once because the reference sequence set is constantly growing (or shrinking).

Another common task is to compare variation just in a restricted subset of residues, perhaps in an adjacent patch but perhaps co-evolving residues scattered along the entire linear length, say all the counterion residues associated with the retinal Schiff base. Intervening extraneous residues are a major distraction in this situation. To understand covariance in the output, it may be necessary to sort and categorize the resultant classes.

All this and much more could be managed in a custom-programmed relational database such as open source MySQL. However an ever-shifting research agenda requires endless reprogramming of database capabilities and interface, with no server support within genomeWiki. That development could be justified for universal gene family data management software. However every gene family raises issues with no counterpart elsewhere and opsins are no exception. There has to be something simpler.

Because opsins are easily alignable even in remotely related homologs due to their many conserved and dispersed internal anchors, it doesn't matter which of ClustalW, T-Coffee, MultAlin, Blast, etc is used to align them. These tools do differ extensively in the ability of the user to control output. It turns out that MultAlin web tool by Florence Corpet has just the properties we need to trim and manage dispersed covariant residues, among them a fast and stable server very tolerant of extraneous characters in fasta sequences -- nothing to buy, nothing needing download and installation.

To trim off extraneous residues C-terminally, simply paste in all or any subset of the Opsin Classifier reference set. Run Multalin at default settings and note the position number in the alignment where trimming should occur. Re-run the sequences, now with the 'maximum line length' parameter set to the position number. This will cause the alignment output to be in two blocks, the second of which is throwaway.

Next, recover trimmed sequences in fasta format so a similar process can be applied N-terminally to trim that. At the bottom of the page, click on the 'results as an html page' option to get paste the initial alignment graphic to text that can be copied. In any text editor, install the first fasta header symbol '>" by replacing carriage returns with 'carriage return >'. Then replace double spaces (which only occur after the sequence names with carriage return.

Resubmit the resulting carboxyly trimmed fasta sequences, now specifying where the N-terminal trim should occur. The top block will be trim but the rest the desired double-trim. That may be a satisfactory stopping point. However to recover doubly trimmed fasta sequences for other web tools, the second and later blocks must be consolidated by pasting side by side and discarding redundant name columns in spreadsheet or 1xn html table.

To isolate small blocks or individual residues of interest, simply set the 'graduation step' to create the appropriate flanking single spaces. Replace those with tabs, put in any desktop spreadsheet and delete unwanted columns. This leaves a column with the original sequence name tags (first word only -- note the Opsin Classifier has looked way ahead on this) and columns of desired residues. The graduation step can even be set to 1, which places every residue in its own column. Any combination of non-adjacent residues can be brought side by side using a simple custom row to direct horizontal sorts (and a second to allow restores). Comparative genomics then consists of various vertoca; sort orders, as illustrated in the section on counterions to the Schiff base lysine.

This whole process takes 2-3 minutes to perform -- it's not a hack because it uses Multalin features more or less as intended. You're back to biology in minutes instead of spending months writing MySQL code.


Using the Opsin Classifier to determine ancestral introns in ciliary opsins

Here we begin with a tunicate opsin (blue) of somewhat uncertain affinities within the Opsin Classifier. Three of its introns (orange) correspond exactly in position and phase to those of the full range of opsins -- from cone to pineal to encephalopsin (but not melanopsin, RGR, or protostomal opsins: not shown). Another 3 introns (magenta) have no counterparts in other opsin genes; these are tunicate-specific. (Intron gain and loss are unusually common in Ciona generally.)

Using the Opsin Classifier to align the tunicate opsin to a full spectrum of (representative) frog opsins, residues homologous to tunicate exon ends can be reliably located. These are shown in the same color scheme. However some of these latter opsins have intron gains of their own (phases shown in teal). The data fit very parsimoniously with 3 introns in the ancestral protein, the first of phase 2 and the others phase 0. These 3 introns were already established at the time of amphioxus and tunicate encephalopsin divergences, indeed sea urchin and even ragworm.

Opsin ancient introns.png


Using the Opsin Classifier to compare introns across Lophotrochozoa and Vertebrates

The snapshot of a colored alignment makes clear how to compare intron position and phase in melanopsins from widely diverged species. The magenta coloring in the query sequence from Schistostoma shows 3 exon ends followed by numbering 0,1,2 to indicate phase (basepairs extending beyond the last coding triplet). The Schmidtea ortholog is colored similarly.

Next, the query sequence is run against the full set of opsins in the classifier. The alignment to stickleback is important because the genome project there has reliably established where and how the exon breaks occur. Using the Schistosoma exon termini in a web browser text search locates the homologous peptides in stickleback. These are colored red in the alignment and also where located in the intronated sequence. It turns out exon breaks and phasing exactly match those of Schistosoma (for the 3 exons considered but not all).

However the peptides transfered homologously to tick and bee (and vertebrate ciliary opsins, not shown) fail to correspond to exon breaks. This establishes some specificity -- the matching introns are not some generic property of all GPCR nor all rhodopsin-class receptors

Opsin loph mel introns.png

Using the Opsin Classifier to annotate a new opsin in hemichordate

Below is a step-by-step example showing how the Opsin Classifier is used within an overall annotation strategy for recovering intronated opsin homologs in an unstudied species, here the hemichordate Saccoglossus kowalevskii. It's best to follow along actively replicating the steps on actual live web tools.

Here we expect to find minimally an opsin that underlies epidermal photoreception. The best inital queries (for detecting diverged sequences) are likely fellow Ambulacraria (ie sea urchin) opsins. The best initial target is Trace Archives Saccloglossus 'other' because these are transcripts which give longer blastn matches than dispersed exons. However this draws a blank, possibly because cell types with opsin transcripts are exceedingly rare. Hence we move to the "wgs" division.

A flawed but long sea urchin dna query, XM_778236, a GNOMON pipeline product labelled "similar to Go-coupled rhodopsin" elicits a significant hit provided the "Somewhat similar sequences (blastn)" trace query option has been selected. This option is critical for queries across diverged species or diverged opsin classes.

The key feature of this match is length. The rule of thumb is anything exceeding 40 bp will prove informative; here we have 71 bp. The percent identity is quite respectable at 76% given the time elapsed since sea urchin and acornworm divergence -- and this may be even be a match of paralogs. Note however the two gaps. The second can be written off as a slight gap misplacement by blastn. The first cannot. It represents an apparent inactivating frameshift (change in reading frame). However traces contain many artefactual indels and most likely this is sequencing error. We cannot rule out a recent processed pseudogene at this point -- indeed they give better initial hits than a multi-exonic gene.

The trace blast graphic shows five stacked hits. I selected one of these because its length was good at 917 bp and sufficient material was left over on both flanks, that is 754 bp upstream and 94 bp downstream.

>gnl|ti|1695985935 Length=917 Score = 41.0 bits Expect = 9.0 Identities = 54/71 (76%), Gaps = 3/71 (4%) Strand=Plus/Minus

Query  474  CCTT-TTCTGGACTATCACACCGTTCTTTGGATGGAGCAGCTACAC-CTACGAACCATTTGGCACGTCGTG  542
            |||| |||||| | || |  ||| | |||||||||||||||||  | ||| ||||||   || || |||||
Sbjct  163  CCTTATTCTGGTCGATGATGCCGCTGTTTGGATGGAGCAGCTATGCGCTA-GAACCAGAAGGTACATCGTG  94

Next, the retrieved full length trace is back-blastxed against the Opsin Classifier collection. This gives three immediate benefits: properly translating the trace irrespective of its frameshifts (which surface as exon breaks with breaks too short for an intron in the numbering), finding the best available gene model (the initial query choice may have beeen sub-optimal), and extending match length over what Trace Archive blastn could do.

Here we see that the top 15 matches for trace ti|1695985935 are consistently opsins previously classified as peropsins and neuropsins. There appear to be two disjoint segments to the blastx match to PERa_braFlo Branchiostoma floridae but the numbering shows they really reflect a single extended match encompassing a frameshift. Here the trace was on the minus strand which means larger numbers are earlier in the coding sequence. The two fragments can then be joined into a single polypeptide, likely with a one amino acid glitch at the frameshift join. Blast often extends its matches too far into bogus territory and this must be trimmed.

PERa_braFlo     Branchiostoma floridae (amphioxus) ?? ... -1   182  3.6e-28   2
PERa_braBel     Branchiostoma belcheri (amphioxus) ?? ... -1   174  5.7e-27   2
PER_xenTro      Xenopus tropicalis (frog) ?? 0.2.0.2.1... -1   195  4.8e-26   2
NEUR_calMil     Callorhinchus milii (elephantfish) ?? ... -2   151  1.9e-24   2
PER_gasAcu      Gasterosteus aculeatus (stickleback) ?... -1   174  4.2e-23   2
...

Score = 167 (58.8 bits), Expect = 3.6e-28, Sum P(2) = 3.6e-28 Identities = 28/60 (46%), Positives = 41/60 (68%), Frame = -2

Query:   769 GLTIFGMSLSCVSSFAGRWLFGKFGCYFHGFAGMLFGLGSIGNLTVISIDRYIITCKRNL 590
             G+ IFG   S  SS    WLFG  GC ++GF GM FG+ +IG LT +++DRY++ C+++L
Sbjct:    83 GICIFGYPFSGASSLRSHWLFGGVGCQWYGFNGMFFGMANIGLLTCVAVDRYLVICRQDL 142


Score = 182 (64.1 bits), Expect = 3.6e-28, Sum P(2) = 3.6e-28 Identities = 35/82 (42%), Positives = 50/82 (60%), Frame = -1

Query:   266 YVYSCNQNFNYKLHLFTEWSYRHYYALLAVAWSNALFWSMMPLFGWSSYALEPEGTSCTIDWMNNDNQYISYVSCVTVTCFI 21
             Y+  C Q+   K++      Y  Y  + A+ W  A FW+ +PL GW+ Y+LEP GT+CTI+W  ND+ YISYV+    +CFI
Sbjct:   134 YLVICRQDLVDKVN------YNTYGVMAALGWLFAAFWAALPLVGWAEYSLEPSGTACTINWQKNDSLYISYVT----SCFI 205

The two fragments can then be joined into a single polypeptide, likely with a one amino acid glitch at the frameshift join. This 133 residue peptide is blastped against Opsin Classifier. Here the Expect = 5.4e-33 is highly encouraging but the two small insertions raise questions for an opsin. Perhaps extraneous residues have been incorporated into our gene model, currently a fragment corresponding to residues 83-197 of the 361 residue peropsin in amphioxus.

Switching to a six-frame translation view in a second browser tab, we see GLTI is preceded by a stop codons without good opportunities for a GT-AG splice site that does not sacrifice regions of apparent alignment. This could reflect additional frameshifts but without any guidance from blastp output, we have no idea what frame might contain further good upstream sequence. Next we look for exon boundary guidance from the best gene model amphioxus PERa_braFlo. Here we see GLTI is towards the end of an exon. If we look to GRWL as the valid start of the alignment, it lacks a supporting phase 0 intron of PERa_braFlo

Downstream there is a GT splice option for a phase 2 intron at YISYV (which would eliminate the insert) or an option for adding IRSKTDTTFVDT followed by a phase 0 splice start (and a stop codon). However that extension has no support in any known opsin. Further, there is no support from PERa_braFlo exon boundaries.

The information in this trace has not yet been exhausted because we can re-blastn against the trace archives, perhaps finding a trace that stayed on task. Indeed this picks up two high-identity trace with clear multiple exons, ti|1723199539 and ti|1705099698, supporting the notion that the original trace reflects a processed pseudogene. By recursively using the Opsin Classifier with further rounds of trace archive searching (to tile out ends), we eventually arrive at a 246 residue fragmentary gene model with consistent top matches to peropsins and neuropsins, sharing two appropriate introns with correct phasing but also two fused introns with respect to later deuterostomes. This fragment is a good candidate for ortholog as these genes are single copy in all studied species.

The fragmentary gene is best further studied after release of acornworm contig assembly.

>PER_sacKol Saccoglossus kowalevskii Expect = 2.0e-49 PERa_braFlo Identities = 97/246 (39%)
IIYYFFLLSTGLTIFGMSLSCVSSF GRWLFGKFGCYFHGFAGMLFGLGSIGNLTVISIDRYIITCKRSL 1
2 WSYRHYYALLAVAWSNALFWSMMPLFGWSSYALEPEGTSCTIDWMNNDNQYISYVSCVTVTCFILPCAVMTYDYLAAYMKMVKAGYTLSEETEKPNND 0
0 MCIALVAAFLLSWFPSATVFLWAAFGNPGNIPLSFTGVADAFTKIPAVFNPVIYVALNPEFRKYFGKTIGCRRKRKKPIAVRLNGSEQNVENTI* 0

PERa_braFlo     Branchiostoma floridae (amphioxus) ?? 0.2...   484  2.0e-49   1
PERa_braBel     Branchiostoma belcheri (amphioxus) ?? 0.2...   472  3.8e-48   1
PER_monDom      Monodelphis domestica (opossum) ?? 0.2.0....   443  4.5e-45   1
PER_xenTro      Xenopus tropicalis (frog) ?? 0.2.0.2.1.0....   442  5.7e-45   1
PER_gasAcu      Gasterosteus aculeatus (stickleback) ?? 0...   436  2.5e-44   1
PER_homSap      Homo sapiens (human) ?? 0.2.0.2.1.0.1 ind...   435  3.2e-44   1
PER_galGal      Gallus gallus (chicken) ?? 0.2.0.2.1.0.1 ...   418  2.0e-42   1
NEUR_gasAcu     Gasterosteus aculeatus (stickleback) ?? 0...   304  3.9e-41   2
NEUR_ornAna     Ornithorhynchus anatinus (platypus) ?? 0....   396  4.3e-40   1
NEUR_monDom     Monodelphis domestica (opossum) ?? 0.2.2....   390  1.9e-39   1
NEUR_homSap     Homo sapiens (human) ?? 0.2.2.2.0.1 indel...   383  1.0e-38   1
NEUR_galGal     Gallus gallus (chicken) ?? 0.2.2.2.0.1 in...   383  1.0e-38   1
NEUR_xenTro     Xenopus tropicalis (frog) ?? 0.2.2.2.0.1 ...   370  2.4e-37   1
NEUR_anoCar     Anolis carolinensis (lizard) ?? 0.2.2.2.0...   365  8.3e-37   1

genomeWiki display experiments

We're looking for a way to color introns according to their phases. genomeWiki code resists displaying colors within preformatted html text

mmmmmm
bbbbbbbbbbbbbbbbbb
dddddd
dddddd

kkkkkkkk pppp jlkljlk hygvkkg lyglyggy

bbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbb
>RHO1_bosTau
MNGTEGPNFYVPFSNKTGVVR--SPFEAPQYYLAEPWQFSMLAAYMFLL-IMLGFPINFLTLYVTVQHKK--LRTPLNYILLNLA-VADLFMVFGGFTTTL-YTSLHGYFVFGP---TGCNLEGFFATLGGEIALWSLVVLAIER-YVVVCKPMS-NFRF------GENHAIMGVA-FTWVMALAC-AAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVF-TVKE------AAAQQQESA-----------TTQKAEKEVTR--MVIIMVIAFLICWLPYAGVAFYIFTH---QGSDFGPIFMTIPAFFAKTSAVYNPV-IYIMMNKQFRNCMVTTLCCGKNPL--GDDEASTTVS--KTETSQVAPA              


RHO1_ osTa                                                MNGTEGPNFYVPFSNKTGVVR--SPFEAPQYYLAEPWQFSMLAAYMFLL-IMLGFPINFLTLYVTVQHKK--LRTPLNYILLNLA-VADLFMVFGGFTTTL-YTSLHGYFVFGP---TGCNLEGFFATLG GEIALWSLVVLAIER-YVVVCKPMS-NFRF------GENHAIMGVA-FTWVMALAC-AAPPLVGWSR YIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVF-TVKE ----AAAQ--QQES------------ATTQKAEKEVT-RMVIIMVIAFLICWLPYAGVAFYIFTH---QGSDFGPIFMTIPAFFAKTSAVYNPV-IYIMMNKQ FRNCMVTTLCCGKNPL--GDDEASTTVS--KTETSQVAPA                                                                                                                                                                                                                                                                                                               
RHO2_galGa                                                MNGTEGINFYVPMSNKTGVVR--SPFEYPQYYLAEPWKYRLVCCYIFFL-ISTGLPINLLTLLVTFKHKK--LRQPLNYILVNLA-VADLFMACFGFTVTF-YTAWNGYFVFGP---VGCAVEGFFATLG GQVALWSLVVLAIER-YIVVCKPMG-NFRF------SATHAMMGIA-FTWVMAFSC-AAPPLFGWSR YMPEGMQCSCGPDYYTHNPDYHNESYVLYMFVIHFIIPVVVIFFSYGRLIC-KVRE ----AAAQ--QQES------------ATTQKAEKEVT-RMVILMVLGFMLAWTPYAVVAFWIFTN---KGADFTATLMAVPAFFSKSSSLYNPI-IYVLMNKQ FRNCMITTICCGKNPF--GDEDVSSTVSQSKTEVSSVSSSQVSPA                                                                                                                                                                                                                                                                                                          
SWS2_galGa                                           MHPPRPTTDLPEDFYIPMALDAPNITALSPFLVPQTHLGSPGLFRAMAAFMFLL-IALGVPINTLTIFCTARFRK--LRSHLNYILVNLA-LANLLVILVGSTTAC-YSFSQMYFALGP---TACKIEGFAATLG GMVSLWSLAVVAFER-FLVICKPLG-NFTF------RGSHAVLGCV-ATWVLGFVA-SAPPLFGWSR YIPEGLQCSCGPDWYTTDNKWHNESYVLFLFTFCFGVPLAIIVFSYGRLLI-TLRA ----VARQ--QEQS------------ATTQKADREVT-KMVVVMVLGFLVCWAPYTAFALWVVTH---RGRSFEVGLASIPSVFSKSSTVYNPV-IYVLMNKQ FRSCMLKLLFCGRSPF--GDDEDVSGSSQ-ATQVSSVSSSHVAPA                                                                                                                                                                                                                                                                                                          
SWS1_galGa                                                  MSSDDDFYLFTNGSVP-----GPWDGPQYHIAPPWAFYLQTAFMGIV-FAVGTPLNAVVLWVTVRYKR--LRQPLNYILVNIS-ASGFVSCVLSVFVVF-VASARGYFVFGK---RVCELEAFVGTHG GLVTGWSLAFLAFER-YIVICKPFG-NFRF------SSRHALLVVV-ATWLIGVGV-GLPPFFGWSR YMPEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIVPLSLIIFSYSQLLS-ALRA ----VAAQ--QQES------------ATTQKAEREVS-RMVVVMVGSFCLCYVPYAALAMYMVNN---RDHGLDLRLVTIPAFFSKSACVYNPI-IYCFMNKQ FRACIMETV-CGK-PL--TDDSDASTSAQ-RTEVSSVSSSQVGPT                                                                                                                                                                                                                                                                                                          
LWS_homSap                               MAQQWSLQRLAGRHPQDSYEDSTQSSIFTYTNSNSTR --GPFEGPNYHIAPRWVYHLTSVWMIFV-VIASVFTNGLVLAATMKFKK--LRHPLNWILVNLA-VADLAETVIASTISV-VNQVYGYFVLGH---PMCVLEGYTVSLC GITGLWSLAIISWER-WMVVCKPFG-NVRF------DAKLAIVGIA-FSWIWAAVW-TAPPIFGWSR YWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCITPLSIIVLCYLQVWL-AIRA ----VAKQ--QKES------------ESTQKAEKEVT-RMVVVMVLAFCFCWGPYAFFACFAAAN---PGYPFHPLMAALPAFFAKSATIYNPV-IYVFMNRQ FRNCILQLF--GKKV-----DDGSELSSASKTEVSSVSS--VSPA                                                                                                                                                                                                                                                                                                          
PIN_galGal                                                   MSSNSSQAPPNGTPGPF---DGPQWP-YQAPQSTYVG-VAVLMGTV-VACASVVNGLVIVVSICYKK--LRSPLNYILVNLA-VADLLVTLCGSSVSL-SNNINGFFVFGR---RMCELEGFMVSLT GIVGLWSLAILALER-YVVVCRPLG-DFQF------QRRHAVSGCA-FTWGWALLW-STPPLLGWSSYVPE GLRTSCGPNW-YTGG-SNNNSYILSLFVTCFVLPLSLILFSYTNLLL-TLRA ----AAAQ--QKEA------------DTTQRAEREVT-RMVIVMVMAFLLCWLPYSTFALVVATH---KGIIIQPVLASLPSYFSKTATVYNPI-IYVFMNKQ FQSCLLEMLCCGYQPQ--RTGKASPGTPGPHADVTAAGLRNKVMPAHPV                                                                                                                                                                                                                                                                                                      
VAOP_galGa                                              MDVFRALGNESLLSNSSGPARW---DPFHHP-LDSIQPWHFRLVAAVMFVV-TSLSLAENLAVILVTFKFKQ--LRQPVNYVIVNLS-VADFLVSLTGGTISF-LANLKGYFYMGH---WACVLEGFAVTFF GIVALWSLALLAFER-YIVICRPVG-NMRL------RGKHAAQGIA-FVWTFSFIW-TIPPTMGWSSYTTSK-IGTTCEPNW YSGA-YNDRSYIIAFFTTCFIVPLLVILVSYGKLLQ-KLRK ----VSNT--QGRL------------RTARKPERQVT-RMVVVMIIAFLICWMPYAVFSILATAY---PSIELDPHLAAIPAFFSKTATVYNPI-IYVFMNKQ FRMCLIQMFKCSAIET--AESNMNPTSERATLTQDKRDSQLSVMAVRSTILKRKTGDEHRADDLWLFRQLQKPKCVPCRAGDGS                                                                                                                                                                                                                                                                   
PPIN_anoCa                                                            MDSLDTNTLS-PNASTVRVVLMPRIGYTIIAIIMATS-CTLSVILNTAVIAITIKYRQ--LRQPINYSLVNLA-IADLGAALLGGSLNV-ETNAVGYYNLGR---VGCVTEGFAMAFF GIVALCTIAVIAVDR-AIVIAKPMG-TITF------TTRKAMIGVA-VSWIWSLVW-NTPPLFGWGG-YQMEGVMTSCAPDW--ANSDPINVSYIICYFLFCFTIPFITILASYGYLIW-TLRQ ----V-AKVGLAQR------------GSTTKAEAQVS-RMVIVMVMAFLICWLPYATFALVVVGN---PQIYINPIIATIPMYMAKSSTFYNPI-IYIFMNKQ FRDCLVRCLLCGRNPC--ASEQTDEDDLEVSTIAPAPSSRRGKVAPV                                                                                                                                                                                                                                                                                                        
PARI_anoCa                                                            MENESSLVLEGAEGYIVRPTIFPRAGYGVLAFLMFIN-ALFSLFNNFLVIAVTLKNPQ--LRNPINIFILNLS-FSDLMMSICGTTIVI-ATNYHGYFYLGR---RFCIFQGFAVNYF GIVSLWSLTILAYER-YNVVCQPLG-TLQM------STQRAYQLLG-FIWVFCLFW-AVVPLFGWSS-YGPEGVQTSCSIGW--EERSWNNYSYLIVYFLSCFFIPVLIIGFSYGNVIR-SLHG ----LNKKVEQLGG------------KSNPEEEFRAV-IMVLVMVVAFLICWLPYTLFALTVVFN---PALNISPLAATIPTYLSKTSPVYNPI-IYIFLNKE FRECAVEFITCGKVVL--TSPEEDISTSAISDEGIAPC-KINQVTPV                                                                                                                                                                                                                                                                                                        
CIL2_plaDu                                           MDDLGFLGNSSVN--YTVPLLQED----PLLLRILY--FGPTSYVITAIYLCIV-GVIGTLSNGVIMYLYFKDKS--LRSPMNLLFVNLA-MSDFTVAFFGAMFQFGLTCTRKYMSPGM---ALCDFYGFITFLG GLASEMNLFIISVER-YLAVVRPFD-VGNL------TNRRVIAGG VFVWLYSLVF-AGGPLVGWSS-YRPEGLGTWCSISW--QDRSMNTMSYVTAVFLGCYFFPVSIIIFCYFNVWR-KVKE ----AADA--QGGA------------GTAGKAEKSIF-RMSVIMVTCYLTAWTPYAIVCLIASYG---PPNGLPIYAEVLPSLFAKSSQVYNPI-IYVLMNKP YRSALVSLVCRGRNPF--DEAGGTAGGTTAKDETLGKGNKVAAA                                                                                                                                                                                                                                                                                                           
CIL1_plaDu                                   MDGENLTIPNPVTELMDTPINSTYFQNLNAET----DGGNHYIYNAFTATDYNICAAYLFFI-ACLGVSLNVLVLVLFIKDRK--LRSPNNFLYVSLA-LGDLLVAVFGTAFKFIITARKTLLREED---GFCKWYGFITYLG GLAALMTLSVIAFVR-CLAVLRLGS-FTGL------TTRMGVAAMA FIWIYSLAF-TLAPLLGWNH-YIPEGLATWCSIDW--LSDETSDKSYVFAIFIFCFLVPVLIIVVSYGLIYD-KVRK ----VAKT-----G------------GSVAKAEREVL-RMTLLMVSLFMLAWSPYAVICMLASFG---PKDLLHPVATVIPAMFAKSSTMYNPL-IYVFMNKQ FRRSLKVLLGMGVEDL--NSESERATGGTATNQVAAT                                                                                                                                                                                                                                                                                                                  
ENC_homSap                                              MYSGNRSGGHGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSI-GLLGVGNNLLVLVLYYKFQR--LRTPTHLLLVNIS-LSDLLVSLFGVTFTF-VSCLRNGWVWDT---VGCVWDGFSGSLF GIVSIATLTVLAYER-YIRVVH-----ARV------INFSWAWRAITYIWLYSLAW-AGAPLLGWNRYILDVHG-LGCTVDWKSKDA--NDSSFVLFLFLGCLVVPLGVIAHCYGHILY-SIRM LRCVEDLQTIQVIK--------------ILKYEKKLA-KMCFLMIFTFLVCWMPYIVICFLVVNG---HGHLVTPTISIVSYLFAKSNTVYNPV-IYVFMIRK FRRSLLQLLCLRLL---------RCQRPA-KDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKT--NGSKVDVIQVRPL                                                                                                                                                                                                                                                    
ENC4_ raFl                                MALYNNTSSPSQDLLWDAPYSQGHIWDNSSASNSSEDVMDQGKVELQDFSDAG-YTAIATCLALI GFVGFTNNFVVILLIGCHRQ--LRTPFNLLLLNMS-VADLLVSVCGNTLSF-ASAVRHRWLWGR---PGCVWYGFANSLF GIVSLVTLSALAFER-YCVVVR----SSDM------LTYKSSLVVITFIWLYSLLW-TSLPLLGWSSYQFEGHN VGCSVNWVQHNP--DNVSYIVTLMVTCFFVPMVVVCWSYAWIWR-TVRM ---SSE-AKPECGN--------------SQNAGRLVT-TMVVVMIICFLVCWTPYAVMALIVTFG---ADHLVTPTASVIPSLVAKSSTAYNPI-IYVLMNNQ FREF---LLARLQRV--------CCRQQA-VPRVTPMDDNVHV-RLGGEGPSQSQQFLPAGENVENVDMLEYVQENCKPKADSL--STISE                                                                                                                                                                                                                                                            
ENC5_ raFl                                             MLGMHNVMNATDYDNNNATFAAWNFQRNGTTE-EEVEFSG-FDTVAVVIAAI-GIAGFLSNGAVVLLFLKFRQ--LRTPFNMLLLNMS-VADLLVSVCGNTLSF-ASAVRHRWLWGR---PGCVWYGFANHLF GLVSLISLAVISYER-YRMVVKPKGPGSSY------LTYNKVGLAIIFIYLYCLLW-TTLPIVGWSSYQLE GPKISCSVAWEEHSL--SNTSYIVAIFIMCLLLPLLIIIYSYCRLWY-KVKK ---GSQNLPPAIRK--------------SSQKEQKIA-RMVVVMITCFLVCWLPYGAMALVVSFG---GESLISPTAAVVPSLLAKSSTCYNPL-VYFAMNNQ FRRYFQDLLCCGRRLFDASASVNTCNTSA-MPRHSPVFQKPDSDQYNGIQKSREPQMRTTGQNAPYRQWIEMQTIAVVVKADEV--NNKFGEVKT                                                                                                                                                                                                                                                        
PIN_stoPur                                               MSNLMTGLVTNVNALSGIGNETPTTIGLSSLVVPVSRTTYNYLTVYTGFL-TIFGILNNGIVMILFARFPS--LRHPINSFLFNVS-LSDLIISCLASPFTF-ASNFAGRWLFGD---LGCTLYAFLVFVA GTEQIVILAALSIQR-CMLVVRPF--TAQK------MTHRWALFFISLTWIYSLII-CVPPLFGWNRYTYEGPGT ACSVAWNSPSP--GDTSYIIFIFVLVLVIPFGIIIFCYGLLVY-AVKK ----ISRTQAALSS--------------EAKADRKVS-KMIFIMILFFLIAWTPYTGFSLYVTFG---KNVVITPLAGTFPPFFAKLCTIHNPI-IYFLLNKQ FKDALIQLFCCGENPFDRDESEHEGRGGRHRHRTAPSATAHIGGRGRASSLPTATSMLDIPQAASTAASSSGKTQNKESLEKGP--STSETTNKRVFELSSKIQKFEISEKNNTPSSSELPGASSLSGALMPPRRAMKNQVGCLPPVDN                                                                                                                                                                                                  
ENC1_anoGa                                                               MYDV-TDAAAINSDHQELMAPWAYNGAAVTLFFI-GFFGFFLNIFVIALMYKDVQ- LWTPMNIILFNLV-CSDFSVSIIGNPLTL-TSAISHRWLYGK---SICVAYGFFMSLL GIASITTLTVLSYER-FCLISRPF--AAQN------RSKQGACLAVLFIWSYSFAL-TSPPLFGWGAYVNEAANI SCSVNWESQTA--NATSYIIFLFIFGLILPLAVIIYSYINIVL-EMRK ------NSA-RVGR--------------VNRAERRVT-SMVAVMIVAFMVAWTPYAIFALIEQFG---PPELIGPGLAVLPALVAKSSICYNPI-IYVGMNTQ-FRAAFWRI--RRSNGVAGQPDSNNTNNSNRDKESARHTAKEG-LECSLDFCHWTVR-GTRVSISSAERNVPAPAARERSGGHSV--TGSREESRDRHVTLKTMLSVGPRSPSSVAPVAADCSTTDVPTSGDGSVRIVRQDSELSVIHDGGGGGGGSSSRVLVIKSQKPRSNML                                                                                                                                                                          
ENC2_anoGa                                                               MNDAPNDVAASAVDYEDLMAPWAYNASAVTLFFI-GFFGFFLNLFVIALMCKDMQ- LWTPMNIILFNLV-CSDFSVSIIGNPLTL-TSAISHRWIFGR---TLCVAYGFFMSLL GITSITTLTVLSYER-YCLISRPF--SSRN------LTRRGAFLAIFFIWGYSFAL-TSPPLFGWGAYVQEAANI SCSVNWESQTK--NATTYIIFLFVFGLVVPLIVIVYSYTNIIV-NMRE ------NSA-RVGR--------------INRAEQRVT-SMVAVMIVAFMVAWTPYAIFALIEQFG---PPELIGPGLAVLPALVAKSSICYNPI-IYVGMNTQ-FRAAFSRV--RNK----GQQAAADQNTTTMQRELTK-SSRDM-VECSFDFCR---K-KSRFKISLVKPTAPLAVVDVSSTSHRD--KGTSRSPLDQTVLNETNEDVGRERSGGGGGGGAYAGT-----------RFVRPDFELSVINSG---------KSILIKSKNFRSNLL                                                                                                                                                                          
ENC_aedAeg                                                                                 MESWAYVASAVTLFFI-GFFGFFLNLFVIALMCKDVQ- LWTPINIILFNLV-CSDFSVSIIGNPFTL-TSAISRHWIFGR---TVCIAYGFFMSLL GITSITTLTVLSYER-FCLISHPF--SSRS------LSRRGAVFAILFIWSYSFAL-TSPPLFGWGAYVNEAANIS CSVNWESQTL--NATSYIIFLFVFGLVVPLVVIVYSYTNIVV-NMKR ------NAA-RVGR--------------INRAEKRVT-RMVFVMVLAFMIAWTPYAVFALIEQFG---PTDIISPALGVLPALIAKSSICYNPI-IYVGMNTQ-FRAAFNRV--RNN-------ESVDNNTITNQKDITMNTSKEI-VECSFDFCRKK-RLKIKLQSNAKSKNNNNNSRNQSIADPSS---TSNGDDLDQPSPAQTVLNSTVANSGSVASFGPK--------------KRLRSDFELSVISSG---------KSILIKSNTFRSNLV                                                                                                                                                                          
ENC_culPip                                                                                 MPPWAYVATAVVLFFI-GFFGFFLNLFVIALMCKEVQV LWTPMNIILLNLV-CSDFSVSIVGNPFTL-SSAISHRWLFGR---KLCVAYGFFMSLL GITSITTLTVLSYER-FYLISRPF--SSRS------LSRRGALGAVLLIWCYSFAL-TSPPLFGWGAYVNEAANIS CSVNWETQTL--NATTYIIYLFVFGLVVPLTVIVYSYTNIIV-NMKK ------NAA-RVGR--------------INRAEKRVT-TMVAVMVIAFMVAWTPYSVFALMEQFG---PPDVIGPGLAVLPALIAKSSICYNPI-IYVGMNTQ-FRAAFNRV--RHD-------PGDMANTTTNQKELTRSSRDVASVDCSFDFCRKKNRLKMKLHNSIHRSAAIKAGRSQSPHSESE--RSSSGATQERSTRVQTMLNATVDSRSISGSTG----------------KLMKSDFELSVINSG---------KSILIKSNTFRSNLV                                                                                                                                                                          
ENC_triCas                                                                   MKNFNSTEIGDELLIPVEGYIAAAVVLFCI-GFFGFSLNLTVIIFMLKERQ- LWSPLNIILFNLV-VSDFLVSVLGNPWTF-FSAINYGWIFGE---TGCTIYGFIMSLL SITSITTLTVLAFER-YLLIARPF--RNNA------LNFHSAALSVFSIWLYSLSL-TIPPLIGWGEYVHEAANLS CSVNWEEKSP--NSTSYILYLFAFGLFLPLVIITFSYVNIIL-TMRR ------NAAFRVGQ--------------VSKAENKVA-YMIFIMIIAFLTAWSPYAIMALIVQFG---DAALVTPGMAVIPALLAKSSICYNPV-IYIGLNAQ-VKGAKW-V--SGLIYLFQFQQAWMQKWKKNRRGSDALGTSRVMLETIHQACRDEKTDKLLEKKTKFCKDFETDVSML                                                                                                                                                                                                                                                                          
ENC_apiMel                                                                 MSLNRSTMEHVIYEDQVSPVMYIGAAIALGFI-GFFGFTANLLVAIVIVKDAQ-ILWTPVNVILFNLV FGDFLVSIFGNPVAM-VSAATGGWYWGY---KMCLW YAWFMSTLGFASIGNLTVMAVER-WLLVARPM---QAL------SIR HAVILASFVWIYALSL-SLPPLFGWGSYGPEAGNV-SCSVSWEVHDPVTNSDTYIGFLFVLGLIVPVFTIVSSYAAIVL-TLKK-------VRKRA GA--------------SGRREAKIT-KMVALMITAFLLAWSPYAALAIAAQYF---N AKPSATVAVLPALLAKSSICYNPI-IYAGLNNQ-FSRFLKKIFDARGSRTAVPDSQHTALTALNRQEQRK                                                                                                                                                                                                                                                                                                                   
PERc_patYe                                                                    MPFPLNRTDTALVISPSEFRIIGIFISIC CIIGVLGNLLIIIVFAKRRS--VRRPINFFVLNLA-VSDLIVALLGYPMTA-ASAFSNRWIFDN---IGCKIYAFL-CFNSGVISIMTHAALSFCR-YIIICQ-YGY RKK------ITQTTVLRTLFSIWSFAMFW-TLSPLFGWSSYVIEVVPV-SCSVNWYGHGL--GDVSYTISVIVAVYVFPLSIIVFSYGMILQEKVCKDSRKNGIRAQQRYTP------------RFIQDIEQRVTF ISFLMMAAFMVAWTPYAIMSALAI-----GSFNVENSFAALPTLFAKASCAYNPF-IYAFTNANFRDTVVEIMAPWTTRRVGVSTLPWPQVTYYPRRRTSAVNTTDIEFPDDNIFIVNSSVNGPTVKREKIVQRNPINVRLGIKIEPRD--SRAATENTFTADFSVI                                                                                                                                                                                                                                                 
PER_homSap                                                              MLRNNLGNSSDSKNEDGSVFSQTEHNIVATYLIMA GMISIISNIIVLGIFIKYKE--LRTPTNAIIINLA-VTDIGVSSIGYPMSA-ASDLYGSWKFGY---AGCQ VYAGLNIFFGMASIGLLTVVAVDR-YLTICLPDV- GRR------MTTNTYIGLILGAWINGLFW-ALMPIIGWASYA-PDPTGATCTINW-RKNDR SFVSYTMTVIAINFIVPLTVMFYCYYHVTLSIKHHT------TSDCTESL------------NRDWSDQIDVTK MSVIMICMFLVAWSPYSIVCLWASFG---DPKKIPPPMAIIAPLFAKSSTFYNPC-IYVVANKK FRRAMLAMFKCQTH----QTMPVTSILPMDVSQNPLASGRI                                                                                                                                                                                                                                                                                                              
NEUR_galGa                                                      MASDCNSSSQEEYLPHYMQQEDPFASKLSREADIIAGFYLTVI GILSTLGNGYVIFMSSKRKK--KLRPAEIMTVNLA-VCDLGIS VGKPFSI-ISFFSHRWIFGW---MGCR-WYGWAGFFFGCGSLITMTAVSLDR-YLKICHLAY- GTW------LKRHHAFICLALIWAYATFW-ATVPFAGVGSYA-PEPFGTSCTLDWWLAQASVAGQAFVLSILFFCLLFPTAVIVFSYVKIILKVKSST----KEVAHYDTRI------------QNSHILEMKLTK VAMLICAGFLIAWIPYAVVSVWSAFG---QPDSVPIQFSVVPTLLAKSAAMYNPI-IYQVIDCKFACCRSGGPKTLQKK----SSLKES YTISSHRDSAALSGTQLEV                                                                                                                                                                                                                                                                                                           
MEL1_homSa   MNPPSGPRVPPSPTQEPSCMATPAPPSWWDSS------QSSISSL-GRLPSISPT APGTWAAAWVPLPTV-------DVPDHAHYTLGTVILLVGLTGMLGNLTVIYTFCR SR-SLRTPANMFIINLA-VSDFLMSFTQ-APVFFTSSLYKQWLFGE--T GCEFYAFCGALF-GISSMITLTAIALDR-YLVITRPLATFGVA------SKRRAA-FVLLGVWLYALAW-SLPPFFGW SA-YVPEGLL-TSCSWDYMSFTPAVRAYTMLLCCFVFFLPLLIIIYCYIFIFRAIRETGR ALQTFGACKGN---G--ESLWQRQRLQ--SECKMAK-IMLLVILLFVLSWAPYSAVALVAFAG-- YAHVLTPYMSSVPAVIAKASAIHNPI-IYAITHPKYR VAIAQHLPCLGVLLGVSRRHSRPYPSYRSTHRSTLTSHTSNLSWISIRRRQESLGSESEV GWTHMEAAAVWGAAQQANGRSLYGQGLEDLEAKAPPRPQGHEAETPGK TKGLIPSQDPRM                                                                                                                                                                                                                           
MELa_ raFl   MTELPSFQPPTNSTEEENAVFPTALTEWISE VGNQVGEAALKLLSGEGDGMEVTPTPGCTGNASVCNGTDSGGGVVWDIPPLAHYIVGTAVFCVGCCGMFGNAVVVYSFIK SK-GLRTPANFFIINLA-LSDFLMNLTN-MPIFAVNSAFQRWLLSD--F ACELYGFAGGLF-GCLSINTLMAISMDR-YLVITKPFLVMRIV------TKQR VMFAILLLWIWSLVW-ALPPLFGW-SA-YVPEGF GTSCTFDYMTPKLSYHIFTYIIFFTMYFIPMGVIIYCYYNIFATVKSGDKQFGKAVKEMAHEDVKN--K AQQERQRK--NEIKTAK-IAFIVITLFLSAWTPYAVVSALGTLG--YQ-DLVTPYLQSIPAVFAKSSAVYNPI VYAITHPKFR-AAVKKHIPCLSGCLPADEEET-KTKTRGATTTASMSMTQTTAPTV HDPQASVHSGSSVSVDDSSGVSRQDTMMVK VEVDNRMEKAGGGAADTAPKDGTSVPTVSAQIEVRPSGNVNTKA---EVIPSPQSAAVAHGASASPVPK VAELSSSVSLESAAIPGKIPTPLPS
MEL2_galGa                                                                      MGTQPHSVTKSEIPDHVLYTVGTCVLVIGSIGIIGNLLVLYAFYS NK-KLRTPQNFFIMNLA-VSDFLMSASQ-APICFVNSLHREWILGD--I GCDLYAFCGALF-GITSMMTLLAISVDR-YLVITKPLRSIQWT------SKKRTIQI-IAAVWLYSLGW SVAPLLGW-SS-YVPEGLM-ISCTWDYVTYSPANRSYTMILCCCVFFIPLIIILHCYLFMFLAIRSTGR DVQKLGSCSRKSFLS--QSM------K--NEWKLAK-IAFVVIIVYVLSWSPYACVTLIAWAG-- RGNTLTPYSKSVPAVIAKASAIYNPI-IYAIIHPRYRK TIHNAVPCLRFLIRISKNDLLRGSINESSFRTSLSSHQSLAGRTKNTCVSSVSTGEA NWSDVELDTVEPAHEKLQPRRSHSFSSSLRQKRDLLP-DSYSCSEETEEK VSLSSSYLEKVLGRSAFPSSPVALVTSSLRAASLPVGLNSSSASRGAGSDISQMKTEESHNN  
MEL1_strPu                                                                                              MNAVTTALPHGLNKPTIEAR WTK-SLRTPPNMLIVNLA-ISDFGMVITN-FPLMFASTIYNRWLFGD--A GCQFYAFCGALF-GIMSIANMTAIALDR YYVICWSLEAVRSV------THRRSM-IIIIIVWCYAIFW-SIPPFFGV-GS-YVLEGY-GLGCTFDFMTKDLNHYLHVSFLFASSFVVPVTIIIVCFTRIAITVRAHRHELNKMRTKLTEDKDKKHKSSIRRANKAK--TEFQIAK-VGFQVTIFYVLSWMPYSIVAVIGQY---FDSDLLTPLGTVVPVIFAKCSAIWNPI-IYCLSHEKFNAALKEKLMGMCGIEIPSKHRSM---GSQESSVTGRRGMHRQNSSTLSESSVTSTVDQDAIELKDRKQGPATVKVQQEKVEG----GTYRRNPGDVTFSKDAGVEVDEKRRGDQGQRDDRVRPQGEGQMDQWSQPPPAPASASAPTP-GVNDKEYLTKM                                                                                                                                                                                        
RHO_plaDum                                   MSRSEVLVPGSMSLDGLLTTAHPIGNDSI ETILHPYWQQF-DIENTIPDSWHYAVAAWMTFFGILGVSGNLLVVWTFLK TK-SLRTAPNMLLVNLA-IGDMAFSAINGFPLLTISSINKRWVWGK--L WRELYAFVGGIF-GLMSINTLAWIAIDR-FYVITNPLGAAQTM------TKKRAF-IILTIIWANASLW-ALAPFFGW -GAYIPEGFQ-TSCTYDYLTQDMNNYTYVLGMYLFGFIFPVAIIFFCYLGIVRAIFAHHA EMMATAKRMGANTGK--ADA----DKK--SEIQIAK-VAAMTIGTFMLSWTPYAVVGVFGMIK PHSEMFIHPLLAEIPVMMAKASARYNPI-IYALSHPKFR AEIDKHFPWLLCCCKPKPKAQLPSSTTKGSIASKTEADTSV                                                                                                                                                                                                                                                                                                            
MEL1_capCa                                        MMSYADGYLDNS-TAPIEESPYLPHGTFFHPHWRPYREMLLNMNPLIYYGLGLYMAVVGIVGTLGNLVVITLFIK ---SLRTPPNMFIINLA-LSDMGFCATNGFPLMTVASFQKLWRWGP--V ACELYALAGSIT-GFNSIATLALISMDR-YMVIAKPFYAMKHV------SHKRSL-IQIILAWTWAFIW-SAPPLLRMGYGRYIPEGFQ-VSCTFDYLSRDLKNLIFVWCLFVFGFFIPVLAIACSYVGIIRAVGAQ-SKEMRKTAEKMGAKTGK--SDK----EKK--QDIAMAK-VAAGTIGLFLMSWTPYAAVSMIGIAG---NRSWITPYVSQIPVMFAKASAMWNPI-LYALSHPKFRAA-LEDHMPWLLVC                                                                                                                                                                                                                                                                                                                                        
RHA _schMe                                                                                    EVYHYLVGVYISIVGISGVLGNLLVLYIFAR AK-SLRTPPNMFIMSLA-IGDLTFSAVNGFPLLTISSFNTRWAWGK--L TCEIYGFIGGLF-GFISINTMALISLDR-YFVIAQPFQTMKSL------TIKRAI-IMLVFVWLYSLIW-STPPFFGY GN-YVPEGFQ-TSCTFDYLTQSKGNIIFNIGMYIGNFIIPVGIIIFCYYQIVKAVRVHELEMLKMAQKMNASHPTS--MKTG-- AKK--ADVQAAK-ISVIIVFLYMLSWTPYAIIALMALTG---RRDHLNPYTAELPVLFAKTSAMYNPF-IYAINHPKFRIQ-LEKKFPCLICCCPPKPK                                                                                                                                                                                                                                                                                                                                  
MEL1_schMa                                                MKQNLTFATLWPDDNDFASIVHSHWHKF----IQPDPLYYYLVGIYIGIVGILAVMGNSLVITLFLL CK-QLRTPPNMLIVSLA-ISDFSFALINGFPLKTIAAFNHRWGWGK--L ACELYGFAGSIF-GFISLTTMAFIALDR-YLVIVQPFETFSRI------TYGKVI-VMIFITWIWSALW-SIPPFFGY GS-YIPEGFH-TSCTFDYLSTDLPNLIFNAGLYILGFLCPVFIIIFSYYQIVKTVRLNELELMKMAQSLDLQNPSA--MKTG- GDKK--ADIEAAK-TSIILVLLYLMSWSPYAIVCLMTLIG---SRDSLTPFHSELPVLFAKTSAVYNPI-VYAVKHPKFRME-IEKRFPFLICCCPPKPK ERLQNTIVSKIQV-SQIGIGTVSGGNENTLNTVKRED                                                                                                                                                                                                                                                                                            
MEL2_schMa                                                MSSNRTIEMLRPYMKDFDSIVLPYWYKF----EQPNPYYQYAIGLFIAVVGITGMCLNLLVIVFFTM FK-SLRTPSNILVVNLA-ISDFGFSAVIGFPLKTMAAFNNFWPWGK--L ACDLYGLAGGLF-GFVSLSTIAAVALDR-YLVIATPFESVFQT------TPRRTL-LLMLFLWMWSLMW-TIPPLFGF GKRYVTEGYQ-TSCTMDYISTDLNNRLFNIGLFGFGFLCPLFLSLFCYARIILIVRSRGKDFIEMAASSKGTNQKE--KSANV SSSK--SDTFVSK-SSAILLGVYLICWTPYSFVCLMALIG---YADYITPLMVEIPCLCAKTA---NPC-IYAFRYPKFRSL-LQQRFGFLRLTKNRVSY ERSQHAILSTIHVVTDCQYGTVSGGNENTLNTFILLD                                                                                                                                                                                                                                                                                            
MEL3_schMa                                     MSEIKNFTRSLLLYNRTFSMIKNNIHDSDIIMLNHWIKY----TQPDPIYNYLVAIFVALIGIFGTITNLLVIFVFL TPK-SSISLQCALIINLA-ISDFGFSAVIGFPLKTIAAFNQYWPWGS--V ACQLYGFISATF-GFLSLTTIAAISFDR-YLVIVKDHKT---T------NFRVIC-TVIGFLWIWSIIW-TIPPFFGF GR-YVLEGYQ-TSCTFDYISNDMPSLLFSGGMYIFGFMFPVLLCIYCYVNLLKIVRNNERVVL-ISLSNDGASKQR--ESVR- NRKR--LDIEATK-SVILSLLFYLMSWTPYAMVCLISILG---QSYFLTPTIAEMPHIFAKMAAIYNPI-LYAFTNRKFKNA-LGIRKTSSVIMQQQRLLSKGQLKPLVSLLFLVN                                                                                                                                                                                                                                                                                                                  
MEL1_helRo                                                           MDSVTWYKDFHPHWWKYHDIIVNAPMAFYYFLGTFFAVVGFLGVFGNIIVVWVFSR TP-SLRTPSNVLVINLA-ICDILFSALIGFPMSALSCFQRHWIWGN--F--CQFYSFVAGIT-GLASINCLAVIAVDR-YLVVGQPLAMLNQS------HFRRSF-YHVLIIWTWACVW-SAMPLIGW--GEYILEGFG-VSCTFDYLTRTTWNISFNVCLFTFCFGMPVSVIILSYIGIIRSIAKNRKEFSSLTAENSS--------------RAR--QEIKIAK-VFAVCMTAFILCWVPYATVAQLGIYG---YDQMVSPYTAELPVMLAKTSALWNPI-IYAFSHPKYRK--CLKELP--IFRQKRKFRFLGSKSHTKSSDGVGTIALTSTINRTATRH                                                                                                                                                                                                                                                                                                    
MEL_aplCal                                                MNVSS-----SLTSQPYHELLHPHWLEH----EEAPEGVHLSVGVFITLVGVLAVCGNSLVIITCIR FK-DLRTRSNILIINLA-VGDLLMCLID-FPLLAAASFYGEWPYGR--Q VCQMYAFLTAIA-GLVTINTLAVIAADRYWAVVRRP--TPGQK------LPKCVTSIAVASVWAYSISW-ALCPILGW--GAYVLDGIR-TTCTFDFLTRTWENRSFVIGMMIGNFVLPFALMVFSYFRIWVAVRKVKSG NVFCAIRHNYNLALGSTLFVKQHRYRLHCEQKTVK-IIMFLLIAFTVSWSPYLAVSIIGLFG---DRSQLTYQNTLTASLIAKTSMVFNPI-LYSISHPKVRKRIANLACCYSVRRHQQQTSRIKTGRRSTSSATPSRS                                                                                                                                                                                                                                                                                                                 
MEL_lotGig                                                MSIASHVWTNSSTNHFNFSVLHQHWQNQ----TPLSTACQYTIGIFISTVAVIAVIGNSIVIWAHVR IK-SLSTTSNMLILNLC-VGCLIMCIVD-FPLYATSSFLQKWIFGH--K VCEIYATITGTA-GLLIMNSYSAIAFDRFITVTRYN--NPNYP------RSKSATMCISGFVWIYSLSW-SMAPVVGW--SRYQLDGSG-TTCTFDYLSTTWTNRSFILSIAFFNFVLPLCFILFAYSRI---LHLISSHSREMKSYRSAVIISKGKASIPKRFR----SERKTAI-TLLITVVVFCLSWVPYVIIALIGQFG---NQSFITPQISVIPQLVAKLSTVTNPI-LYSLSHPVVRNKLFLRLRHELYRRPSDSVSSSRGIQMKNIEFI                                                                                                                                                                                                                                                                                                                     
MEL _ raFl                                                          MSPNLTNTSLLPNRTDRPE--LSPADVTMQLVFGSMMLVFGLIGVVGNAVALYAFCR SR-SLRRPKNYLIANLC-LTDMVVCLVY-SPIIVTRSL-SHG LPS--KESCIVEGFVVGLG-SIVSICSLAGIAVER-YVTITQPIKSLSIL------THRALL-GAVSAVWVYAFLL-AFPPLVGW--GRYVSEESK-ISCTFDYLSTDDATRAHVIVLVIGAFGLPFSVITYCYVRSFATVRKCTKERKQMSPLAKSDSR----------------SEVKAAV-NSFVITTSFCLCWCPYAVVATMGVSG-----FTVHSHAVFIAALLAKLSVLFNPV-AYVLSIP NSNVNIES--TELTVPYSASRESCLLSRAATERLAGRSPSLTDIVREFGLQQTASHRE                                                                                                                                                                                                                                                                                              
LWS_ixoSca                                  MGSEGQRTNMSLLDELASPYMKNGTLVESVPDEMLYMVHPHW-YNFKPMNPLWHSLLGFAMVILGVISVVGNSMVIYIM-TTSK-SLRSPTNMLVVNLA-FSDW CMMAFMMPTMAANCFAETW-ILG--PFMCEVYGMVGSLF-GCGSIWSMVMITLDRYNVIVRG--VAAAPL------THKRAA-LMIFFVWFWALTW-TLLPFFG-WSR YVPEGNM-TSCTIDYLTKALWSASYVVAYAGGVYWTPLFINIYCYSKIVR-AVAQHEKQLRLQ-ARKMNVA----SLRANAEQTKTSAEARLAK IALMTVGLWFMAWTPYLTIAWAGIFS--DGS-KLTPLATIWGSVFAKANACYNPI-VYGIS-HPKYRAALARRFPSLVCMPPGGDQLDTRSEASGITTIEDKVMTTET                                                                                                                                                                                                                                                                                                            
LWS1_apiMe                   MDTLNITTSFFIEVMPSNISTLTTTGPQFARQLMRFNNQTVVSKVPEEMLHLIDLYW YQFPPLDPLWHKILGLVMIILGIMGWCGNGVVVYVF-IMTP-SLRTPSNLLVVNLA-FSDFIMMGFMCPPM-VICCFYETW VLG--SLMCDIYAMVGSLC-GCASIWTMTAIALDRYNVIVK -GMSGTPL------TIKRAM-LQILGIWLFGLIW-TILPLVG-WNR YVPEGNM-TACGTDYLSQDWTFKSYILVYSFFVYYTPLFTIIYSYYFIVS AVAAHEKAMKEQ-AKKMNVT----SLRSG-DNQNTSAEAKLAK VALTTISLWFMAWTPYLVINYIGIF---NRS-LITPLFTIWGSLFAKANAIYNPI-VYGIS HPKYRAALKEKLPFLVC--GSTE--DQTAATAGDKASEN                                                                                                                                                                                                                                                                                                                   
LWS2_apiMe                                  MIAVSGPSYEAFSYGGQARFN-NQTVVDKVPPDMLHLIDANW-YQYPPLNPMWHGILGFVIGMLGFVSVMGNGMVVYIF-LSTK-SLRTPSNLFVINLA-ISDFLMMFCMSPPM VINCYYETW-VLG--PLFCQIYAMLGSLF-GCGSIWTMTMIAFDRYNVIVKG--LSGKPL------SINGAL-IRIIAIWLFSLGW-TIAPMFG-WNR YVPEGNM-TACGTDYFNRGLLSASYLVCYGIWVYFVPLFLIIYSYWFIIQ-AVAAHEKNMREQ-AKKMNVA----SLRSS-ENQNTSAECKLAK VALMTISLWFMAWTPYLVINFSGIF---NLV-KISPLFTIWGSLFAKANAVYNPI-VYGIS HPKYRAALFAKFPSLAC--AAEPSSDAVSTTSGTTTVTDNEKSNA                                                                                                                                                                                                                                                                                                             
LWS_triCas                                  MSVMGEPNFIAWAAQRSGYGGGNLTVVDKVLPDMLHLVDAHW-YQFPPMNPLWHGILGFVIGVLGFVSIVGNGMVIYIF-SSTK-ALRTPSNLLVVNLA-FSDFLMM CMSPAM-VINCYNETW-VLG--PLVCELYGMSGSLF-GCASIWTMTFIALDRYNVIVKG--LSAQPL------TKKGAM-LRILIIWVFSTLW-TIAPFFG-WNR-YVPEGNM-TACGTDYLTKDWVSRSYILVYAVWVYFVPLFTIIYSYWFIVQ AVAAHEKSMREQ-AKKMNVA----SLRSS-EAAQTSAECKLAK-IALMTITLWFFAWTPYLVTNFTGIF---EGA-KISPLATIWCSLFAKANAVYNPI-VYGIS HPKYRQALQKKFPSLVC--AGEPD-DTTSTASGVTNVTTDEKPATA                                                                                                                                                                                                                                                                                                            
LWS_pedHum                              MVKSQELLMVRPTQNHHFWSQGLNFGNQTVVDKVLPEMLHMIDPHW-TGFPPMNSLWHGILGFVIGVLGAVSLIGNGVVIYIF-TTTK-NLRTPSNLLVVNLA-FSDFFMMFCMSPIM-VVNCYNETW-IMG GPFMCEIYGLLGSLF-GNASIWTMTMIALDRYNVIVKG GLSAKPM------TIKMAL-LNILFVWIQALLW-TILPLFG-WNR YVPEGNM-TACGTDYLTKDWFSRSYILVYSIFVYFLPLFIIIYSYIFIIQ AVIDHENNMRMQ-AKKMEVA----SLRSQ-DDKKKSVEIKLAK IALMTIALWFFAWTPYLVINYAGIF---ETL-TISPLFTIWGSVFAKANAVYNPI-VYGIS HPKYRAALKKKFPSLVC--GTLED-DHTSAASGITTCNEENTTKV                                                                                                                                                                                                                                                                                                             
MWS_droMel                                    MES FAVAAAQLGPHFAPLSNGSVVDKVTPDMAHLISPYW-NQFPAMDPIWAKILTAYMIMIGMISWCGNGVVIYIF-ATTK-SLRTPANLLVINLA-ISDFGIMITNTPMM-GINLYFETW-VLG--PMMCDIYAGLGSAF-GCSSIWSMCMISLDRYQVIVKG--MAGRPM------TIPLAL-GKIAYIWFMSSIW-CLAPAFG-WSR YVPEGNL-TSCGIDYLERDWNPRSYLIFYSIFVYYIPLFLICYSYWFIIA AVSAHEKAMREQ-AKKMNVK----SLRSS-EDAEKSAEGKLAK-VALVTITLWFMAWTPYLVINCMGLF---KFE-GLTPLNTIWGACFAKSAACYNPI-VYGIS HPKYRLALKEKCPCCVF--GKVDDGKSSDAQSQATASEAESKA                                                                                                                                                                                                                                                                                                               
 UV_droMel                                 MEPLCNASEPPLRPEARSSGNGDLQFLGWNVPPDQIQYIPEHW-LTQLEPPASMHYMLGVFYIFLFCASTVGNGMVIWIF-STSK-SLRTPSNMFVLNLA-VFDLIMCL--KAPIFIYNSFHRGF-ALG--NTWCQIFASIGSYS-GIGAGMTNAAIGYDRYN-VITK--PMNRNM------TFTKAV-IMNIIIWLYCTPW-VVLPLTQFWDRFVP EGYL-TSCSFDYLSDNFDTRLFVGTIFFFSFVCPTLMILYYYSQIVG-HVFSHEKALREQ-AKKMNVE----SLRSNVDKSKETAEIRIAK-AAITICFLFFVSWTPYGVMSLIGAFG--DKS-LLTPGATMIPACTCKLVACIDPF-VYAISHPRYR-LELQKRCPWLG--VNEKSGEISSAQSTTTQEQQQTTAA                                                                                                                                                                                                                                                                                                               
 UV_apiMel                                       MSNDSIHWEARYLPAGPPRLLGWNVPAEELIHIPEHW-LVYPEPNPSLHYLLALLYILFTFLALLGNGLVIWIFCA AK-SLRTPSNMFVVNLA-ICDFFMMI--KTPIFIYNSFNTGF-ALG--NLGCQIFAVIGSLT-GIGAAITNAAIAYDRYS TIAR--PLDGKL------SRGQVI-LFIVLIWTYTIPW-ALMPVMGVWGRFVP-EGFL-TSCSFDYLTDTNEIRIFVATIFTFSYCIPMILIIYYYSQIVS-HVVNHEKALREQ-AKKMNVD----SLRSNANTSSQSAEIRIAK AAITICFLYVLSWTPYGVMSMIGAFG--NKA-LLTPGVTMIPACTCKAVACLDPY-VYAISHPKYR LELQKRLPWLE--LQEKP--ISDSTSTTTETVNTPPASS                                                                                                                                                                                                                                                                                                              
UV1_pedHum                                 MKITTESENNISLSYYQPF IDKEESLIWNVDPSELVHIPDHW-FNFSAPHPLSNYLLGFLYFIFFVISCTGNGIVIWIFTT SK-NLRTASNVFVVNLA-IFDFIMMA--KTPIMIYNSMNLGF-ECG--FVWCQIFASAGALS-GIGASITNTCIAYDRCE TITN--PLQ-KS------GKKKAF-LLAAFTWIYALPW-AVLPFLEIWGKFAP-EGYL-TTCTVDYLTDTSQTRMFIVTIFFAAYVLPLSLIIYFYTKIVL-HVINHEKSLKAQ AKKMNVE----SLRSDGNKN-YAVEIRITK-VAIAMCFLFVISWTPYAVVALIGCFG--NK HLITPLVSMIPACACKAVACIDPY-IYAISHPRFR VEVNKRFACLAGCLQEKELQDDAVSKNTVNAENVDT                                                                                                                                                                                                                                                                                                                 
 LU_apiMel                             MLLHNKTLAGKALAFIAEEG YVPSMREKFLGWNVPPEYSDLVHPHW-RAFPAPGKHFHIGLAIIYSMLLIMSLVGNCCVIWIFST SK-SLRTPSNMFIVSLA-IFDIIMAF--EMPMLVISSFMERM-IGW--EIGCDVYSVFGSIS-GMGQAMTNAAIAFDRYR TISC--PIDGRL------NSKQAA-VIIAFTWFWVTPF-TVLPLLKVWGRYTT EGFL-TTCSFDFLTDDEDTKVFVTCIFIWAYVIPLIFIILFYSRLLS-SIRNHEKMLREQ AKKMNVK----SLVSNQDKER-SAEVRIAK-VAFTIFFLFLLAWTPYATVALIGVYG--NR ELLTPVSTMLPAVFAKTVSCIDPW-IYAINHPR YRQELQKRCKWMGIHEPETTSDATSAQTEKIKTDE                                                                                                                                                                                                                                                                                                                    
 LU_droMel                             MHINGPSGPQAYVNDSLGDGSVFP------MGHGYPAEYQHMVHAHW-RGFREAPIYYHAGFYIAFIVLMLSSIFGNGLVIWIFST SK-SLRTPSNLLILNLA-IFDLFMCT--NMPHYLINATVGYI-VGG--DLGCDIYALNGGIS-GMGASITNAFIAFDRYK-TISN--PIDGRL------SYGQIV-LLILFTWLWATPF-SVLPLFQIWGRYQP EGFL-TTCSFDYLTNTDENRLFVRTIFVWSYVIPMTMILVSYYKLFT-HVRVHEKMLAEQ-AKKMNVK----SLSANANADNMSVELRIAK-AALIIYMLFILAWTPYSVVALIGCFG--EQ-QLITPFVSMLPCLACKSVSCLDPW-VYATSHPK-YRLELERRLPWLGIREKHATSGTSGGQESVASVSGDTLALSVQN                                                                                                                                                                                                                                                                                                           
UV2_pedHum                                             MVVAVAEEV EEWKKLGLFDDEYLYKINKYW-MKFPPPSPMSHYFMGIIYSVIMVVGVFGNFLIIYLFLR KR-SLRTPSNVFIFNLA-VSDSLLLL--KMPVFIINSFYLGPALGN--L GCSAYGFVGGLT-GTVSIMTLAAIAFDRYQVIVH----PLERK------TKAAVY-FQILLIWIYAF-F-SIIPLLDVGLNKYVPEGYL-TSCSFDYLTQDTASRLTIFVFFVAAWIVPLSIILGSYMALY--------KVVLKARGTHFNTV----MTRHCKDIEIQRPELKAAV-TVICIVCLWTLSWTPYAV-ALLGITG--NEK-YISPMSSMIPALFCKTASCIDPF-VYAATN-RRFRNELKRKYRKRSRYQPSLKTERKDFFTLSEDNNDRGKGNTIRIREK                                                                                                                                                                                                                                                                                                        
ENC_strPur                                                                                                                                           MSLATKKHFIRNAVEEGGHLLEK--WDKGG YAFIMTFL-GLNSLMSHAVIAVDR-YLVITKPHF GIVV------TYPKAF-LMISIPWVFSFAW-AVFPLAGWGEFTYEGTGAW-CSVRWD--SDQPQIMSYVLAMMFLTFISSIVIMMYCYICIFLTTRRMPRWATSNSIKTHERN-------------RRRR EQKLLKTLIAIAI-AFLVAWSPYAITSMIVVFG---GSELLSLTATTLPSLFAKSSVMINPI-IYAVTSRVFRKSLKK MLTSFFPGCMTYIMTDKSPPSSSRPIQLGLCKYHFLY                                                                                                                                                                                                                                                                                                           
RGR_galGal                                                                       MVTSHPLPEGFTEIEVFAIGTALLVE ALLGFCLNGLTIISFRKIKE--LRTPSNL-LVLSIALADCGICINAFIAAF-SSFLR YWPYGS---EGCQIHGFQGFLT-ALASISSSAAVAWDR--Y---HHYCT RSK------LQWSTAISMMVFAWLFAAFW-ATMPLLGWGE-YDYEPLRTCCTLDYSKGDR -NYITFLFALSIFNFMIPGFIMM---------------------TAYQSIHQ--------KFKKSGHYK FNTGLPLKTL-----VICWGPYCLLSFYAAIE--NVMFISPKYRM IPAIIAKTVPTVDSF-VYALGNENYRGGIWQFLTGQKIEKAEVDSKTK                                                                                                                                                                                                                                                                                                                                
PER_lotGig                                                                            MTAAEFSSFEHSIVGITYMVI GISGTLLSLLVALTFIREKG--LFKYGRAWLHISLAIANVGVVGAFPFSGS-SSFSGR WLYGS---GMCTFYGFIGMFF-GIAAIGNVFALCVER--YLVSKKKDS VDK------VSNQFYWMITALVWINAFFW-GIMPALGWTS YDIEPSGTSCTIKWQNYDS--GYPSFMAMLSLTCFLIPLPVALICLIL----------------SGTDKITE--------DKEEKTYFREDQLRS TCTFLLILALIGWGPYCFICIWALFA--DTTQVS-MLAAVIPPLAAKTMVLLYPV-AYCQGNKRFKNAFLGMFIFNESPKQQ                                                                                                                                                                                                                                                                                                                                      
PER_aplCal                                                      MNDDGNALDGAGVDPAATAAHVSTALGFTKFEHASVGALYMLF CIVGVTLNLLTALTFYKDTK--LTKGSQPWLHILLALANVGVVAPSPFPAS-SSFSGR WLYGS---TMCQIYAFEGMFI-GIAAIGAVIALCIER--YIACQRSGA NDT------QGWFYGWSIT-LVLGNALFW-AIMPLLGWSR YSVEHTGTSCSIDWKNPDE--SFVSYIMTLEVFSFGIPMMSAFFCLISASPRPQPQGGATAAAGSGQEQQGE--------DTACKGCFSEDQLRL LCYVFIGFVLVGWGPFAYLCTLAVFS--DARGIS-MLAAAIPPLACKAMVSAYPL-AYAVVSPRFRQSFLALIGGGEKKKE                                                                                                                                                                                                                                                                                                                                       
ENC_dapPul                                                                                                           MPVWVYWSASAYLLFISIAGLFMNIVVVVIILNDSQVI PISAAAALKFTWPFSH---ELCVAYAMIMSTA -IGSITTLTVLALWRCQHVVWCPTNRNSNFTDPNGRLDRRQGALLLTFIWTYTLIV-TCPPLFGWGRYDREAAHIS CSVNWESKMD--NNRSYILYMFAMGLFIPLMAIFVSYISILLFIHK SQQTS----------------------NNSDTVEKRVTF-MVAVMIGAFLTAWTPYSIMALVETFT---GDNVISPAVATVPSLFAKTSAVLNPL-IYGLLNTQVGSAD                                                                                                                                                                                                                                                                                                                                                   
ENC_ omMor                                                                                                                        LWTPLNIILFNLVCSDFSV--SVLGNPFTLISALFHRWIFGH---TMCVLYGFFMALL GITSITTLTVISFERYLMVTRPLTSRH---------LSSKGAVLSIMFIWTYSLAL-TTPPLLGWGNYVNEAANIQ CSVNWHEQST--NTLTYIMFLFAMGQILPLSVITFSYVNIIRTLKR NSQRL----------------------GRVSRAEARATA-MVFIMIIAFTVAWTPYSLFALMEQFA---TEGIVSVLKV                                                                                                                                                                                                                                                                                                                                                                                 
MEL2_helRo                                                                                                                     TPILRTHANVLIINLA-LCDLIFSSLIGFPMTALSCFKRHWIWGD--L