# HG changeset patch # User greg # Date 1493403826 14400 # Node ID f174450ebc44e221f674d447b8710eab19ff83d6 # Parent 73db26d39092153f939737e8129299db969f849d Uploaded diff -r 73db26d39092 -r f174450ebc44 kaks_analysis.xml --- a/kaks_analysis.xml Tue Apr 11 13:36:06 2017 -0400 +++ b/kaks_analysis.xml Fri Apr 28 14:23:46 2017 -0400 @@ -79,17 +79,17 @@ ]]> - - + + - - - + + + - - + + @@ -100,27 +100,27 @@ - + - + - + - + - + @@ -130,33 +130,33 @@ - + - + - + - + - + - + @@ -195,9 +195,11 @@ -This tool is one of the PlantTribes collection of automated modular analysis pipelines that utilize objective classifications of -complete protein sequences from sequenced plant genomes to perform comparative evolutionary studies. This tool performs orthologous -or paralogous ks analyses of coding sequences and amino acid sequences. +This tool is one of the PlantTribes collection of automated modular analysis pipelines for comparative and evolutionary analyses +of genome-scale gene families and transcriptomes. This tool estimates paralogous and orthologous pairwise synonymous (Ks) and +non-synonymous (Ka) substitution rates for a set of gene coding sequences either produced by the AssemblyPostProcessor tool or +from an external source. Optionally, the resulting set of estimated Ks values can be clustered into components using a mixture +of multivariate normal distributions to identify significant duplication event(s) in a species or a pair of species. ----- @@ -205,26 +207,91 @@ * **Required** - - **Coding sequences (CDS) fasta file for species1** - Coding sequences (CDS) fasta file for species1. - - **Aamino acids (proteins) sequences fasta file for species1** - Aamino acids (proteins) sequences fasta file for species1 - - **Select method for pairwise sequence comparison to determine homolgous pairs** - Pairwise sequence comparison to determine homolgous pairs (cross species comparison requires selection of inputs for species2). + - **Coding sequences for the first species** - coding sequence fasta file for for the first species either produced by the AssemblyPostProcessor tool or an external source selected from your history. + - **Protein sequences for the first species** - corresponding protein sequence fasta files for the first species either produced by the AssemblyPostProcessor tool or an external source selected from your history. + - **Type of sequence comparison** - pairwise sequence comparison to determine homolgous pairs. This can be either paralogous for self-species comparison or orthologous for cross-species comparison. Cross species comparision requires data selected for the second species. * **Optional** - - **Minimum sequence pairwise coverage length between homologous pairs** - Minimum sequence pairwise coverage length between homologous pairs (e.g., 0.5 results in 50% coverage. Legal values lie between 0.3 and 1.0. - - **Evolutionary rate for recalibrating synonymous subsitutions (ks) of species** - (applies to paralogous ks analysis) Recalibrate synonymous subsitutions (ks) of species using a predetermined evoutionary rate that can be determined from a species tree inferred from a collection single copy genes from taxa of interest (Cui et al., 2006). - - **Select PAML codeml control file?** - Select PAML's codeml control file from your history. This file is used to to perfom ML analysis of protein-coding DNA sequences using codon substitution models. Selecting No uses the default file which does not include input (seqfile, treefile) and output (outfile) parameters of codeml. - - **Fit a mixture model of multivariate normal components to synonymous (ks) distribution?** - Fit a mixture model of multivariate normal components to synonymous (ks) distribution to identify significant duplication event(s) in a genome. - - **Number components to fit to synonymous subsitutions (ks) distribution** - Number components to fit to synonymous subsitutions (ks) distribution. - - **Lower limit of synonymous subsitutions (ks)** - Lower limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to reduce background noise from young paralogous pairs due to normal gene births and deaths in a genome. - - **Upper limit of synonymous subsitutions (ks)** - Upper limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to exclude likey ancient paralogous pairs. + - **Coding sequences for the second species** - coding sequence fasta file for for the second species either produced by the AssemblyPostProcessor tool or an external source selected from your history. Required only for orthologous comparison. + - **Protein sequences for the second species** - corresponding protein sequence fasta files for the second species either produced by the AssemblyPostProcessor tool or an external source selected from your history. Required only for orthologous comparison. + - **Alignment coverage configuration** - select 'Yes' to set the minimum allowable alignment coverage length between homologous pairs. PlantTribes uses global codon alignment match score to determine the pairwise alignment coverage. By default, the match score is set to 0.5 if 'No' is selected. + + - **match score** - number of base matches in a pairwise sequence alignment divided by the length of shorter sequence. Positions in the alignment corresponding to gaps are not considered. The score is restricted to the range 0.3 - 1.0. + + - **Species rates recalibration configuration** - select 'Yes' to recalibrate synonymous substitution rates of a species using a predetermined evolutionary rate. Recalibration evolutionary rate can be determined from a species tree inferred from a collection of conserved single copy genes from taxa of interest as described in Cui et al., 2006. Applies only to paralogous comparisons. + + - **recalibration rate** - a predetermined evolutionary recalibration rate. + + - **PAML codeml configuration** - select 'Yes' to enable selection of a PAML codeml control file to carry out maximum likelihood analysis of protein-coding DNA sequences using codon substitution models. Template file "codeml.ctl.args" can be found in the scaffold data installed into Galaxy via the PlantTribes Scaffolds Download Data Manager tool, and are also available at the PlantTribes GitHub `repository`_. Default settings shown in the template are used if 'No' is selected. + +.. _repository: https://github.com/dePamphilis/PlantTribes/blob/master/config/codeml.ctl.args + + - **Rates clustering configuration** - select 'Yes' to estimate clusters of synonymous substitution rates using a mixture of multivariate normal distributions which represent putative duplication event(s). + + - **Number of components** - number of components to include in the normal mixture model. + + - **Lower limit synonymous subsitution rates configuration** - select 'Yes' to set the minimum allowable synonymous substitution rate to use in the normal mixtures cluster analysis to exclude young paralogs that arise from normal gene births and deaths in a genome. + + - **Minimum rate** - minimum allowable synonymous substitution rate. + + - **Upper limit synonymous subsitution rates configuration** - select 'Yes' to set the maximum allowable synonymous substitution rate to use in the normal mixtures cluster analysis to exclude likely ancient paralogs in a genome. + + + - **Maximum rate** - maximum allowable synonymous substitution rate. - 10.1093/bioinformatics/btw412 - 10.1186/1471-2105-10-421 - 10.1093/molbev/msm088 - 10.18637/jss.v004.i02 + + @article{Wall2008, + journal = {Nucleic Acids Research}, + author = {2. Wall PK, Leebens-Mack J, Muller KF, Field D, Altman NS}, + title = {PlantTribes: a gene and gene family resource for comparative genomics in plants}, + year = {2008}, + volume = {36}, + number = {suppl 1}, + pages = {D970-D976},} + + + @article{Altschul1990, + journal = {Journal of molecular biology} + author = {3. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ}, + title = {Basic local alignment search tool}, + year = {1990}, + volume = {215}, + number = {3}, + pages = {403-410},} + + + @article{Katoh2013, + journal = {Molecular biology and evolution}, + author = {4. Katoh K, Standley DM}, + title = {MAFFT multiple sequence alignment software version 7: improvements in performance and usability}, + year = {2013}, + volume = {30}, + number = {4}, + pages = {772-780},} + + + @article{Yang2007, + journal = {Molecular biology and evolution}, + author = {5. Yang Z}, + title = {PAML 4: phylogenetic analysis by maximum likelihood}, + year = {2007}, + volume = {24}, + number = {8}, + pages = {1586-1591},} + + + @article{McLachlan1999, + journal = {Journal of Statistical Software}, + author = {6. McLachlan GJ, Peel D, Basford KE, Adams P}, + title = {The EMMIX software for the fitting of mixtures of normal and t-components}, + year = {1999}, + volume = {4}, + number = {2}, + pages = {1-14},} + diff -r 73db26d39092 -r f174450ebc44 macros.xml --- a/macros.xml Tue Apr 11 13:36:06 2017 -0400 +++ b/macros.xml Fri Apr 28 14:23:46 2017 -0400 @@ -3,7 +3,7 @@ 0.8 - plant_tribes_assembly_post_processor + plant_tribes_assembly_post_processor @@ -40,7 +40,7 @@ - + @@ -59,13 +59,13 @@ - + - + @@ -78,9 +78,9 @@ - - - + + + @@ -90,31 +90,31 @@ - + - - - + + + - + - + - - + + @@ -130,34 +130,4 @@ url = {https://github.com/dePamphilis/PlantTribes},} - - - @article{Sasidharan2012, - journal = {Nucleic Acids Research}, - author = {2. Sasidharan R, Nepusz T, Swarbreck D, Huala E, Paccanaro A}, - title = {GFam: a platform for automatic annotation of gene families}, - year = {2012}, - pages = {gks631},} - - - @article{Li2003, - journal = {Genome Research} - author = {3. Li L, Stoeckert CJ, Roos DS}, - title = {OrthoMCL: identification of ortholog groups for eukaryotic genomes}, - year = {2003}, - volume = {13}, - number = {9}, - pages = {2178-2189},} - - - @article{Emms2015, - journal = {Genome Biology} - author = {4. Emms DM, Kelly S}, - title = {OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy}, - year = {2015}, - volume = {16}, - number = {1}, - pages = {157},} - - diff -r 73db26d39092 -r f174450ebc44 test-data/species1.faa --- a/test-data/species1.faa Tue Apr 11 13:36:06 2017 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,16 +0,0 @@ ->species1_11 -MGVSMGQGNPMGMHLLPSGSSSPRTSPSLRDPPLSLPVLPNSDLSVSLPDLHKLRRNEPVTSGILHVRDLSFLRPRSHNGDDDEETEEMTREQEEKYLQWRSSLVEKLAGIELNLERVKFRMSVEIPPSDDFRAMKKSWENFYASELLSSRNPVRKIAKRPDTILVRGVPSRWFAETRISSKASTLVTHTIIESC ->species1_12 -MSAAAAALRPTEPLPLPSGLSLAPRLKLLLTFFRADLSVRPVDEWQLKTALLAFLRDPPLSLPVLPDSDLSVRTLPDLHKRRRDEPVASGVLHVRDLSFLRPRRRNGDDEEEEAEEMTREQEEEKYFQWRSSLVEKLAGIELNLEGVKFRMSVEIPPSDDFRAMKKSWENFYASELLSSRNPVRKIAKRPDTILVRGVPSRWFAETRISSKASTLVTHTIFSALGKIRNLNISSDDEWGAKQDGTNKEIISGLNCKVWVQFENYDDFNSAMQALCGRSLEKEGSRLKVDYEVTWDHEGFFRNAQYEPVRSNLEERNSSAHGRKKHYTSRIESDHRKRFRD ->species1_15 -MKDGLSLSFALISSSPDSKCELLNSRPSCRAARRGESGLLIRRSYLRPCQCPFGDRMSEQQDSTSKSSSSSISSSTQESEEEVSITIGSLLAQAKNNSGHSLGRRLSQLGSIPHTPRVNGKIPNLDNATLDHERLSERLGNYGLAEFQIEGDGNCQFRALADQIFRNPDYHKHVRKLVMKQLKEFRKQYESYVPMEYKVYLKKMKRSGEWGDHLTLQAAADRFGAKICLLTSFRDTCLIEIVPRDVTPTRELWLSFWCEVHYNSLYATDDLLTRKTKKKHWLF ->species1_16 -MSEQQDHASKSSCSSLSTSTQESEEDVTVGTLLTEAKNSGRSLGKRLSHLDSIPHTPRVNGQIPDVNNATIDHETLLERLGTYGLAEFQIEGDGNCQFRALADQIFRNPDYHKHVRKSVVKQLKEFRKHYEGYVPMEYKVYLKKMKRSGEWGDHVTLQAAADRFAAKICLLTSFRDTCLIEIVPRGATPTKELWLSFWSEVHYNSLYATEDLPNRKTRKKHWLF ->species1_21 -MAGAGAGESLDLPVVDLASSDLAAAAKSVRKACVEYGFFYVVNHGAEGLAEKVFGESSKFFEQPLGEKMALLRNRNYLGYTPLGADKLDASSKFKGDLNENYCIGPIRKEGYQNDANQWPSEENFPCWKETMKLYHETALATGKRILSLIALSLNLDVEFFDCPVAFLRLLHYPGEANESDDGNYGASAHSDYGVLTLVATDGTPGLQICREKDRCPQLWEDVHHIEGALIVNIGDLLQRWTNCVFRSTLHRVVAVGKERYSVAFFLHTNPDLVVQCLESCCSEACPPRFPPIRSGDYLEDRLRARYK ->species1_22 -MWGPHIILYLQPFFLLPSSHMSCVLGRPSAPSLDHPQQPNPPPVAPEKPPAVAKKAAEEEEEKKPPKQARRERHAWSSRSAAAEAVGLGLGGSFANRARGEQVAAGWPAWLSAVVGEAIDGWTLRRADSFEKIDKVRTPALALAIVGGGGRELSSSVLSVAQIGQGTYINVYKARDTVTGKIVALKKMGQVCFLLCKPSYRGDTAAGGRGGRRRQQQQTAALAEEESGMAGGGGGGNRLDLPVVDLASSDPRAAAESIRKACVESGFFYVVNHGVEEGLLKRLFAESSKFFELPMEEKIALRRNSNHRGYTPPYAEKLDPSSKFEGDLKESFYIGPIGDEGLQNDANQWPSEERLPSRRETIKMYHASALSTGKRILSLIALSLNLDAEFFENIGAFSCPSAFLRLLHYPGEVDDSDDGNYGASAHSDYGMITLLATDGTPGLQICREKNRNPQLWEDVHHIDGALIVNIGDLLERWTNCIYRSTVHRVVAVGKERYSAAFFLDPNPDLVVQCLESCCSESCPPRFSPIKSGDYLKERLSATYK ->species1_35 -MAAATTSRRGPGAMDDENLTFETSPGVEVISSFDQMGIRDDLLRGIYAYGFEKPSAIQQRAVLPIISGRDVIAQAQSGTGKTSMISLSVCQIVDTAVREVQALILSPTRELAAQTERVMLAIGDFINIQVHACIGGKSIGEDIRKLEHGVHVVSGTPGRVCDMIKRRTLRTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDPVRILVKRDELTLEGIKQFFVAVEKEEWKFDTLCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQKERDAIMGEFRSGATRVLITTDVWARGLDVQQVSLVINYDLPNNRELYIHRIGRSGRFGRKGVAINFVKKEDIRILRDIEQYYSTQIDEMPMNVADLI ->species1_36 -MAAATTSRRGPGAMDDENLTFETSPGVEVISSFDQMGIREDLLRGIYAYGFEKPSAIQQRAVLPIISGRDVIAQAQSGTGKTSMISLSVCQIVDTAVREVQALILSPTRELAAQTERVMLAIGDYINIQVHACIGGKSIGEDIRKLEHGVHVVSGTPGRVCDMIKRRTLRTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDPVRILVKRDELTLEGIKQFFVAVEKEEWKFDTLCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQKERDAIMGEFRSGATRVLITTDVWARGLDVQQVSLVINYDLPNNRELYIHRIGRSGRFGRKGVAINFVKKEDIRILRDIEQYYSTQIDEMPMNVADLI diff -r 73db26d39092 -r f174450ebc44 test-data/species1.fna --- a/test-data/species1.fna Tue Apr 11 13:36:06 2017 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,16 +0,0 @@ ->species1_11 -ATGGGTGTGAGTATGGGGCAAGGAAACCCAATGGGTATGCACTTGTTGCCATCTGGCAGCTCAAGTCCGCGCACCTCGCCTTCCCTCCGCGACCCGCCCCTCTCCCTCCCCGTCCTTCCCAACTCCGACCTCTCCGTGTCCCTCCCCGACCTGCATAAGCTTCGCCGCAATGAGCCCGTCACTTCGGGCATCCTCCACGTCCGCGACCTCTCATTCCTCCGCCCCCGCAGCCACAACGGGGATGATGATGAGGAGACCGAGGAGATGACCCGTGAGCAGGAGGAGAAGTACTTGCAGTGGAGGAGCTCCCTGGTCGAGAAGCTGGCCGGGATCGAGCTCAACCTCGAGAGGGTTAAGTTTCGGATGAGCGTCGAAATCCCGCCCTCCGATGACTTCAGGGCAATGAAGAAGTCTTGGGAGAATTTCTACGCCTCCGAGCTCCTCAGTAGCAGGAATCCTGTGAGGAAGATAGCGAAAAGGCCAGACACAATTCTTGTCCGTGGTGTGCCATCCAGGTGGTTTGCGGAGACGAGGATATCATCGAAAGCCTCCACACTGGTCACACACACTATCATCGAAAGCTGC ->species1_12 -ATGTCCGCCGCCGCCGCCGCCCTCCGGCCGACCGAGCCGCTCCCCCTCCCGAGCGGCCTCTCCCTCGCGCCGCGCCTCAAGCTGCTCCTCACCTTCTTCCGCGCCGACCTCTCCGTCCGCCCCGTCGACGAGTGGCAGCTCAAGACCGCGCTCCTCGCCTTCCTCCGCGACCCGCCCCTCTCCCTCCCCGTCCTCCCCGACTCCGACCTCTCCGTGCGCACCCTCCCCGACCTGCATAAGCGCCGCCGCGACGAGCCCGTCGCCTCGGGCGTCCTCCACGTCCGCGACCTCTCCTTCCTCCGCCCACGCCGCCGCAACGGGGATGATGAGGAGGAGGAGGCCGAGGAGATGACCCGTGAGCAGGAGGAGGAGAAGTACTTCCAGTGGAGGAGCTCCCTGGTCGAGAAGCTGGCCGGGATCGAGCTCAACCTCGAGGGGGTTAAGTTTCGGATGAGCGTCGAGATCCCGCCCTCCGATGACTTCAGGGCAATGAAGAAGTCTTGGGAGAATTTCTACGCCTCCGAGCTCCTCAGTAGCAGGAATCCTGTGAGGAAGATAGCGAAAAGGCCAGACACCATTCTTGTCCGGGGTGTGCCATCCAGGTGGTTTGCGGAGACGAGGATATCATCGAAAGCCTCCACGCTGGTCACACACACTATTTTCTCGGCACTTGGTAAAATAAGGAACCTTAATATTTCTAGTGATGATGAATGGGGAGCAAAACAAGACGGAACCAATAAGGAGATTATATCTGGACTAAATTGCAAAGTGTGGGTGCAATTTGAGAACTACGACGATTTCAACAGTGCAATGCAGGCATTATGTGGACGTTCATTAGAAAAAGAAGGATCACGGTTGAAGGTAGACTATGAAGTAACTTGGGATCATGAAGGTTTCTTCCGCAATGCACAATACGAGCCTGTTCGCAGCAATTTAGAAGAGAGAAATTCATCGGCTCATGGAAGGAAGAAACATTACACATCGCGAATTGAGTCAGATCATAGAAAGAGATTTAGGGAT ->species1_15 -ATGAAAGATGGCCTTTCTCTCTCCTTCGCTCTCATCAGCTCGAGCCCCGACAGCAAGTGTGAGCTACTGAACTCGAGACCCTCCTGTCGCGCGGCGCGGCGCGGCGAGAGTGGCCTTTTGATCCGACGAAGCTATCTAAGACCCTGCCAATGTCCATTTGGAGATAGGATGTCGGAACAGCAGGATAGTACTAGTAAAAGCTCTAGCTCAAGCATCAGCAGCAGTACACAGGAGAGCGAGGAGGAGGTATCTATAACAATAGGTAGCCTCCTCGCCCAAGCAAAGAACAACAGTGGGCATAGTCTTGGAAGGCGCCTCTCTCAATTGGGTTCAATCCCGCACACTCCTCGAGTTAATGGAAAAATCCCTAATCTTGATAATGCAACTTTGGATCATGAAAGATTGTCGGAAAGGTTGGGAAATTATGGTTTGGCCGAGTTTCAAATAGAGGGTGATGGAAATTGTCAGTTCCGAGCTTTGGCAGACCAGATATTTCGCAACCCCGATTATCACAAACATGTGAGAAAGTTAGTCATGAAACAGCTAAAGGAATTCAGAAAACAGTATGAAAGCTATGTACCTATGGAATATAAAGTCTACTTGAAGAAAATGAAAAGATCTGGGGAATGGGGGGATCATCTGACTTTACAAGCAGCTGCAGACAGGTTTGGTGCCAAAATTTGTTTGCTGACGTCATTCAGAGACACCTGCCTAATTGAGATAGTCCCCAGGGATGTGACTCCCACAAGGGAGTTGTGGCTAAGCTTCTGGTGTGAAGTGCACTACAATTCCTTGTACGCAACTGACGATCTCCTAACCCGCAAAACCAAGAAGAAGCATTGGTTGTTC ->species1_16 -ATGTCTGAACAACAGGATCATGCTAGCAAAAGTTCTTGCTCAAGTCTTAGCACCAGTACTCAGGAGAGTGAGGAGGATGTGACAGTTGGTACCCTTTTAACTGAAGCAAAGAACAGTGGACGGAGTCTTGGAAAACGCCTTTCCCACTTAGATTCTATCCCGCACACTCCTCGAGTTAATGGGCAAATTCCTGATGTTAATAATGCAACAATAGACCATGAAACATTACTGGAAAGATTGGGCACTTATGGCTTAGCTGAATTCCAAATTGAAGGAGACGGAAATTGTCAGTTCCGAGCTTTGGCAGATCAGATATTCCGCAATCCTGACTATCACAAACATGTGAGGAAGTCAGTCGTGAAGCAGCTAAAGGAATTCAGGAAACACTATGAAGGCTATGTACCGATGGAATATAAGGTGTACTTGAAGAAAATGAAAAGATCTGGAGAATGGGGAGATCATGTGACCTTACAAGCGGCTGCAGACCGGTTTGCTGCCAAGATTTGCCTGCTGACATCATTTAGAGACACATGCCTAATCGAGATAGTCCCCAGAGGTGCCACTCCCACAAAAGAGCTTTGGTTAAGCTTCTGGAGTGAGGTGCACTACAATTCCTTGTATGCAACTGAAGATCTTCCAAATCGCAAGACCAGAAAGAAGCACTGGCTGTTC ->species1_21 -ATGGCCGGCGCCGGCGCCGGCGAGAGCCTGGACCTCCCCGTGGTGGACCTAGCGTCCTCCGACCTCGCCGCCGCCGCCAAATCCGTCCGAAAGGCTTGCGTGGAGTACGGATTCTTCTACGTGGTCAACCATGGAGCCGAGGGATTGGCGGAGAAGGTGTTCGGGGAGAGCAGCAAGTTTTTCGAGCAGCCGCTGGGGGAGAAGATGGCGCTGCTGAGGAACAGAAACTACCTGGGGTACACCCCGCTTGGCGCCGATAAGCTCGACGCCTCGTCCAAATTCAAAGGAGATCTCAATGAAAATTACTGTATCGGACCTATCAGAAAAGAAGGTTATCAGAATGATGCTAACCAATGGCCTTCTGAAGAGAATTTCCCATGTTGGAAGGAGACAATGAAGCTATACCATGAAACTGCACTTGCTACTGGTAAAAGGATACTCTCTCTAATTGCTCTGAGTTTGAATCTCGACGTTGAATTCTTTGACTGCCCAGTGGCCTTTCTTCGGTTATTGCACTACCCAGGTGAAGCTAACGAGTCCGATGATGGCAATTATGGTGCATCAGCTCACTCAGACTATGGAGTACTAACACTTGTAGCAACAGATGGCACTCCTGGGCTGCAGATATGCAGGGAGAAGGATAGGTGCCCCCAGCTTTGGGAAGACGTTCATCACATTGAAGGGGCCCTGATTGTTAATATCGGCGATTTGCTACAAAGGTGGACTAATTGTGTTTTCAGGTCTACACTGCATCGCGTTGTTGCAGTTGGTAAAGAGCGATACTCTGTGGCTTTCTTTCTTCACACAAACCCTGATTTAGTGGTTCAATGCTTGGAAAGCTGCTGCAGTGAGGCATGCCCACCGAGGTTCCCACCTATAAGGAGCGGCGACTATTTGGAAGACCGATTGAGGGCTAGATACAAA ->species1_22 -ATGTGGGGCCCACATATCATCCTCTATCTCCAACCCTTCTTCCTCCTCCCTTCCTCTCACATGAGCTGCGTCCTCGGCCGCCCCTCCGCCCCCTCCCTCGACCACCCCCAGCAGCCCAACCCCCCGCCCGTCGCCCCGGAGAAGCCGCCCGCCGTCGCCAAGAAGGCGGCCGAGGAGGAGGAGGAGAAGAAGCCGCCGAAGCAGGCTAGGAGGGAGAGGCACGCATGGTCGTCGCGGTCTGCCGCCGCCGAGGCGGTCGGCCTGGGGCTCGGGGGGAGCTTCGCCAACAGGGCGCGCGGGGAGCAGGTGGCGGCCGGCTGGCCCGCCTGGCTCTCCGCCGTCGTCGGCGAGGCCATCGACGGCTGGACCCTGCGCCGCGCCGACTCCTTCGAGAAGATCGACAAGGTACGTACTCCTGCCCTCGCGCTCGCCATTGTTGGTGGTGGGGGAAGGGAACTGAGCTCATCGGTCTTGTCGGTGGCGCAGATCGGGCAGGGGACGTACATCAACGTGTACAAGGCGCGGGACACGGTGACGGGCAAGATCGTGGCGCTCAAGAAGATGGGCCAAGTTTGCTTCCTTCTCTGTAAGCCCAGTTACCGTGGGGATACAGCCGCCGGCGGACGCGGAGGGCGGCGGCGGCAGCAGCAGCAAACCGCCGCTTTGGCAGAAGAGGAATCCGGGATGGCCGGCGGCGGCGGCGGCGGGAATCGCCTGGACCTCCCCGTGGTGGACCTCGCGTCCTCCGACCCCCGAGCCGCCGCCGAGTCCATCCGAAAGGCGTGCGTGGAGTCCGGATTCTTCTACGTGGTCAACCATGGGGTGGAGGAGGGATTGCTGAAGAGGTTGTTCGCGGAGAGCTCGAAGTTCTTCGAGCTGCCGATGGAGGAGAAGATAGCGCTGCGGAGGAACAGCAACCACCGGGGATACACCCCGCCCTACGCCGAGAAGCTCGATCCCTCGTCCAAATTCGAAGGAGACCTCAAGGAAAGTTTCTATATTGGGCCTATTGGAGATGAAGGTTTGCAGAATGATGCTAACCAGTGGCCTTCTGAAGAGCGCTTACCAAGTCGGAGGGAGACAATTAAGATGTACCATGCAAGTGCACTGTCTACTGGCAAAAGGATACTCTCTCTAATCGCTCTGAGTTTGAATCTTGACGCTGAATTCTTTGAGAACATTGGTGCCTTCAGCTGCCCATCAGCATTTCTTCGATTATTGCACTACCCAGGTGAAGTAGACGACTCTGATGATGGCAATTATGGTGCATCAGCTCACTCTGATTATGGAATGATAACCCTCCTAGCAACAGACGGCACTCCTGGGCTACAGATATGCAGGGAAAAGAATAGGAATCCCCAGCTCTGGGAAGATGTTCATCACATTGATGGGGCCCTGATTGTTAACATTGGCGATTTGCTAGAAAGGTGGACGAATTGTATTTACAGGTCTACAGTGCACCGTGTTGTTGCAGTTGGTAAAGAGCGATATTCTGCGGCTTTTTTTCTTGACCCAAACCCTGATTTAGTGGTTCAGTGTTTGGAAAGCTGTTGCAGCGAGTCATGCCCACCGAGGTTCTCACCTATAAAGAGTGGCGACTATTTGAAAGAGCGATTGAGCGCTACATACAAA ->species1_35 -ATGGCGGCGGCCACCACGTCGCGGCGCGGCCCGGGCGCCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGGGTCGAGGTCATCAGCAGCTTCGACCAGATGGGGATCCGCGACGACCTCCTCCGCGGCATCTACGCCTACGGCTTCGAGAAGCCCTCCGCCATCCAGCAGCGCGCCGTCCTCCCCATCATCAGCGGCCGCGACGTCATCGCCCAGGCCCAGTCCGGGACCGGCAAGACCTCCATGATCTCGCTCTCCGTCTGCCAGATCGTAGACACCGCCGTCCGTGAGGTGCAGGCTTTAATACTGTCACCAACTAGAGAACTTGCTGCACAAACAGAAAGAGTTATGCTGGCTATCGGTGACTTCATCAATATCCAAGTGCATGCTTGTATTGGTGGCAAAAGTATTGGTGAGGATATTAGAAAGCTTGAGCACGGAGTGCATGTGGTGTCAGGAACACCTGGCAGAGTCTGTGATATGATCAAGAGAAGGACCTTGCGTACAAGAGCCATTAAGCTCCTAATTCTGGATGAAGCTGATGAGATGTTGGGCAGAGGCTTTAAGGATCAGATATATGATGTGTACAGATACCTCCCTCCAGAACTCCAGGTTTGCTTGATCTCCGCAACTCTGCCTCACGAGATCTTGGAAATGACCAGCAAGTTCATGACTGATCCAGTTCGGATCCTTGTGAAGCGTGATGAATTGACTCTAGAGGGCATCAAACAATTCTTTGTTGCTGTTGAGAAAGAAGAATGGAAGTTTGACACGCTTTGTGATCTTTATGATACACTGACAATCACCCAAGCTGTCATTTTCTGCAACACAAAGAGAAAGGTTGATTGGCTTACGGAAAGAATGCGCAGCAATAACTTCACAGTATCAGCTATGCATGGCGACATGCCTCAAAAGGAAAGGGATGCCATTATGGGTGAATTCAGGTCTGGTGCAACCCGTGTTCTAATCACGACAGATGTGTGGGCTCGAGGCCTCGATGTTCAGCAGGTCTCTCTTGTCATAAATTATGATCTCCCAAATAATCGTGAACTTTACATCCATCGCATTGGTCGCTCTGGACGTTTTGGTCGCAAGGGTGTGGCCATCAATTTTGTCAAAAAGGAAGACATCCGTATCCTGAGAGATATCGAGCAGTACTACAGCACGCAGATTGATGAAATGCCAATGAATGTTGCTGATCTAATT ->species1_36 -ATGGCGGCGGCCACCACGTCCCGGCGCGGCCCCGGCGCCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCCGGGGTCGAGGTCATCAGCAGCTTCGACCAGATGGGGATCCGCGAGGACCTCCTCCGCGGCATCTACGCCTACGGCTTCGAGAAGCCCTCCGCCATCCAGCAGCGCGCCGTCCTCCCCATCATCAGCGGCCGCGACGTCATCGCCCAGGCCCAGTCCGGAACCGGCAAGACCTCCATGATCTCGCTCTCCGTCTGCCAGATCGTCGACACCGCCGTCCGAGAGGTTCAGGCCTTGATACTCTCACCAACTAGAGAACTTGCTGCACAAACAGAAAGAGTTATGCTGGCCATTGGTGATTACATCAATATCCAAGTGCATGCTTGTATTGGTGGCAAAAGTATTGGTGAGGATATTAGAAAGCTTGAGCATGGAGTGCATGTTGTGTCAGGAACACCTGGCAGAGTCTGTGATATGATCAAGAGAAGGACCTTGCGTACAAGAGCCATTAAGCTCCTAATTCTGGATGAAGCCGATGAGATGTTGGGCAGAGGCTTTAAGGATCAGATATATGATGTCTACAGATATCTACCCCCAGAGCTCCAGGTTTGCTTGATCTCCGCAACTCTGCCACATGAGATCTTGGAAATGACCAGCAAGTTCATGACTGACCCAGTCCGGATCCTTGTAAAGCGTGATGAATTGACCCTAGAGGGCATCAAACAATTCTTTGTTGCTGTTGAGAAAGAAGAATGGAAGTTTGATACTCTTTGTGATCTTTATGATACACTGACAATCACCCAAGCTGTCATTTTCTGCAACACGAAGAGAAAGGTTGATTGGCTTACAGAAAGAATGCGCAGCAATAACTTCACGGTATCAGCTATGCATGGTGACATGCCTCAAAAGGAAAGGGATGCCATTATGGGTGAATTCAGGTCTGGTGCAACCCGTGTTCTAATTACGACAGATGTGTGGGCTCGAGGCCTGGATGTTCAGCAGGTCTCTCTTGTCATAAACTATGATCTTCCAAATAATCGTGAACTTTACATCCATCGCATTGGTCGCTCTGGACGTTTTGGTCGCAAGGGTGTGGCCATCAATTTTGTCAAAAAGGAAGACATCCGTATCCTGAGAGATATTGAGCAGTACTACAGCACACAGATTGATGAAATGCCAATGAATGTTGCTGATCTAATT diff -r 73db26d39092 -r f174450ebc44 test-data/species2.faa --- a/test-data/species2.faa Tue Apr 11 13:36:06 2017 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,8 +0,0 @@ ->species2_5 -MESQSAVVPLIAELPEKRGGKTLVEEVWEESKKLWEVTGPAAFTGMVLYSMTIVSQAFAGHLGDRHLAAFSIANTVISGLNFGILLGMASALETLCGQAYGAKQYSMMGTYLQRSWLVLLAFAVLLAPTYIFSGQLLMVLGQPAELSREAGLLGMYLLPLHLMFAIQLPLNKFLQCQRKNWVIALSSVLGFPVHVVATWLLAQRFQLGVLGAAMSLNLSWALITGLQLAYAVGGGCPETWRGFSSSAFMGLKDFVSLSVASGVMTCLESWYYRLLIFLTAYAKNAELAVDALSICLSWAGWEMMIHFGFLAGTGVRVANELGANNGRAAKFATIVSTTTSFLICLLISSLALIFHDKLAILFTSSEAVIDAVDGISVLLALTILLNGIQPVLSGVAVGSGWQALVAYVNIGSYYIIGVPFGVLLAWGFHYGVLGIWVGMIGGTMVQTLILSFITLRCDWNEEALKASSRMRTWSSSK ->species2_6 -MEENRSDIPLISGSELPDRRGGGKISELAKEVWGESKKLWVVAGPAAFTRLTFYGMTVVSQAFAGHIGDLELAAFSIATTVISGLSFGFFVGMASAMETLCGQAYGAKQYHMMGIYLQRSWLILLSFAVLLTPTYIFSEQLLTALGQPAELSRQAGLVSLYMLPLHFVYAIVLPLNKFLQCQRKNWVAAVTTAAAFPVHVVATWLLVRCFRLGVFGAAMALTLSWALATVGLLSYALGGGCPETWRGFSASAFVDLKDFIKLSAASGVMLCLENWYYRILVFLTGYVKNAELAVDALSICISYAGWEMMIHLGFLAGTGVRVANELGAANGARARFATIVSMTTSFLISLFISLLILIFHDKLGMIFSSSQAVIDAVDNISFLLALTILLNGIQPVLSGVAVGSGWQALVAYVNIGSYYLIGVPFGFLLGWGLHYGVQGIWVGMIVGTMVQTLILAYITLRCDWNEEALKASTRMRRWSNSK ->species2_9 -MGTLGGHVAPGAFFFLIGLWHLFGHSRLFLLQRGSYVAPVWFPVPGVRHIELIMIIIGSVISVSMELVIVQPKHQPFDDDGTIPSVHLHNFEHASISLAWLVFAAATIHMDRVRAPMRDAVSQLAAAAAFAQQLLIFHFHSADHAGVQGRYHRLLEMVVAVTLAASLLLIPYQRSIALSLVRSASLVFQGVWFTVMGVMMWTPALVPKGCFMNDEDGLQVVRCRTDEALDRAKSLVNLQFNWYLTGTVAFVVVFYLQMAKQYQEQPQYAPLVKGGRGSDGRCTIGEVNDDEDDLEASKGGLGYIEIER ->species2_10 -MGTLVGHVAPGAGFLLIGLWQLFSHIRLFLLRPSSYSAPVWFPAPGVRHLELILIIIGAAMSILMELVIGPAKHQPFDDDGTIPSDHLHNFEHASISLALLVFAAVTIHLDRVKAPLRDAVSQLVAAAAFAQQLLIFHLHSADHMGVEGQFHWLLQTVIAVTLATTLLGIPYPRSIVVSLVRSASLVLQGVWFVVMGVMLWTPALIPKGCFLNLEEGHDVVRCRTDEALDRAKSLVNLQFSWYLTGTVVFVVLFYLQMAKLYPEEPQYLPLVKGGGGGGDDRDSRFSIGDDDHDDEDDVEAAKRGFGHVVSGTKPVEIER diff -r 73db26d39092 -r f174450ebc44 test-data/species2.fna --- a/test-data/species2.fna Tue Apr 11 13:36:06 2017 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,8 +0,0 @@ ->species2_5 -ATGGAGAGTCAGAGCGCCGTCGTCCCGCTCATTGCCGAGCTCCCGGAGAAGCGGGGAGGCAAAACCCTGGTGGAGGAGGTATGGGAGGAGTCCAAGAAGCTGTGGGAAGTCACCGGCCCGGCCGCCTTTACGGGGATGGTACTCTACAGCATGACCATCGTCAGCCAGGCCTTCGCCGGCCACCTTGGTGACCGCCATCTCGCCGCTTTCTCCATCGCCAACACCGTCATATCTGGCCTTAACTTTGGCATTTTGCTTGGCATGGCGAGTGCGCTGGAGACATTATGCGGCCAAGCCTACGGTGCAAAGCAGTACTCGATGATGGGCACCTATCTCCAGCGCTCATGGCTCGTCCTCCTCGCCTTCGCGGTGCTCCTTGCTCCGACGTACATCTTCAGCGGGCAGCTGCTCATGGTCCTGGGCCAGCCCGCCGAGCTGTCTCGCGAGGCGGGCTTGCTCGGCATGTACCTGCTCCCGCTGCACCTCATGTTTGCCATCCAGCTGCCGCTCAACAAGTTCTTGCAGTGCCAGCGCAAGAACTGGGTCATCGCGCTGTCCTCGGTGCTGGGTTTCCCGGTGCACGTCGTGGCGACCTGGCTGCTGGCGCAGCGCTTTCAGCTTGGCGTCCTGGGCGCAGCGATGTCACTCAACCTGTCCTGGGCGCTCATCACGGGCCTGCAGCTCGCGTACGCTGTTGGCGGTGGGTGCCCAGAGACGTGGAGAGGGTTCTCGTCGTCGGCATTCATGGGCTTGAAGGACTTCGTCAGCTTGTCCGTCGCGTCGGGAGTCATGACGTGCTTGGAGAGTTGGTACTACCGGTTATTGATTTTCCTAACGGCGTACGCGAAGAACGCAGAATTGGCTGTGGATGCACTGTCTATCTGCTTGAGTTGGGCTGGATGGGAGATGATGATTCATTTCGGGTTCTTAGCAGGCACTGGGGTGAGGGTTGCCAATGAGCTAGGCGCCAATAATGGACGAGCTGCAAAGTTTGCGACGATCGTGTCCACGACGACATCATTCCTGATCTGCCTCTTAATTAGTTCACTCGCACTCATTTTCCATGACAAACTCGCAATACTGTTCACGTCTAGTGAGGCTGTGATCGATGCAGTTGACGGTATTTCTGTTCTGCTAGCCCTCACCATCCTCCTCAATGGCATCCAACCTGTGCTATCCGGAGTTGCCGTTGGTTCAGGGTGGCAAGCGCTAGTTGCGTATGTGAACATTGGGAGCTACTACATTATCGGTGTTCCTTTCGGTGTTCTGCTAGCATGGGGTTTCCACTACGGGGTCCTTGGCATTTGGGTTGGAATGATCGGTGGCACGATGGTGCAAACTCTGATTCTTTCATTTATCACCTTACGATGCGACTGGAATGAAGAGGCACTGAAAGCTTCTAGCAGAATGCGGACATGGAGCAGCTCCAAG ->species2_6 -ATGGAGGAGAATCGGAGCGATATCCCGCTCATCTCCGGCTCCGAGCTGCCGGACAGGAGGGGAGGAGGCAAGATCTCCGAGCTTGCGAAGGAGGTATGGGGAGAGTCCAAGAAGCTGTGGGTGGTCGCCGGCCCGGCCGCGTTCACGAGGCTGACATTCTATGGCATGACCGTGGTCAGCCAGGCCTTTGCCGGGCACATCGGTGACCTCGAGCTCGCCGCCTTCTCCATAGCCACCACCGTCATTTCTGGTCTCAGCTTTGGCTTCTTTGTTGGCATGGCGAGTGCAATGGAGACGCTGTGCGGCCAAGCCTACGGTGCAAAGCAGTACCACATGATGGGCATCTACCTGCAGCGCTCGTGGCTCATCCTCCTCAGCTTCGCCGTGCTTCTTACTCCGACCTACATCTTCAGCGAGCAGCTGCTCACCGCGCTGGGCCAGCCCGCCGAGCTGTCGCGCCAGGCGGGCTTGGTCAGCCTGTACATGCTCCCGCTGCACTTCGTCTACGCCATCGTCCTGCCGCTCAACAAGTTCCTGCAGTGCCAGCGCAAGAACTGGGTCGCCGCGGTCACCACGGCCGCGGCGTTCCCCGTTCACGTCGTCGCCACCTGGCTGCTGGTGCGTTGCTTCCGGCTCGGGGTCTTTGGAGCAGCGATGGCGCTCACCCTGTCCTGGGCACTCGCCACGGTGGGTCTCCTCTCGTATGCCTTGGGCGGCGGGTGCCCGGAGACGTGGAGGGGATTCTCAGCTTCTGCCTTCGTGGACTTGAAGGACTTCATCAAGTTGTCCGCGGCGTCTGGTGTCATGCTCTGCTTGGAGAATTGGTACTACCGGATCTTGGTTTTCCTGACGGGCTATGTGAAGAACGCTGAACTGGCTGTCGATGCACTGTCCATCTGTATAAGTTATGCTGGATGGGAGATGATGATTCATTTGGGATTCTTAGCAGGCACTGGGGTGAGGGTGGCTAATGAGCTCGGTGCAGCCAACGGAGCACGAGCGAGATTTGCGACAATTGTGTCGATGACGACATCATTTCTGATCAGCCTATTCATTAGTTTGCTCATCCTGATTTTCCATGACAAACTCGGAATGATCTTCTCGTCGAGTCAGGCTGTGATTGATGCAGTAGACAACATTTCCTTTCTGCTGGCCCTCACCATCCTCCTCAACGGAATCCAACCTGTGCTCTCTGGAGTTGCTGTTGGCTCAGGGTGGCAGGCATTGGTTGCTTATGTCAACATTGGGAGCTATTACTTGATTGGTGTTCCTTTCGGTTTTCTGCTAGGATGGGGCTTGCATTATGGGGTTCAAGGAATTTGGGTCGGAATGATCGTTGGCACAATGGTGCAAACTCTAATACTGGCATATATCACTCTACGGTGTGATTGGAATGAAGAGGCATTGAAAGCTAGTACCCGAATGCGGAGATGGAGCAACTCCAAG ->species2_9 -ATGGGCACACTAGGCGGGCACGTCGCGCCGGGCGCCTTCTTCTTCCTCATCGGCCTGTGGCATCTGTTCGGCCACAGCCGCCTGTTCTTGCTACAGCGGGGCTCCTACGTGGCTCCGGTGTGGTTCCCGGTGCCGGGCGTCCGTCACATCGAGCTCATAATGATAATAATCGGCTCGGTGATCTCCGTCTCGATGGAGCTCGTCATCGTGCAGCCGAAGCACCAGCCGTTCGACGACGACGGCACCATCCCCAGCGTCCACCTGCACAACTTCGAGCACGCGTCCATCTCGCTGGCGTGGCTCGTCTTCGCCGCCGCCACCATCCACATGGACAGGGTCCGGGCGCCGATGCGGGACGCGGTGTCGCAGCTGGCGGCCGCGGCCGCGTTCGCGCAGCAGCTGCTCATCTTCCACTTCCACTCCGCGGACCACGCGGGCGTGCAGGGGCGGTACCACCGTCTGCTGGAGATGGTGGTCGCCGTCACGCTCGCCGCCTCGCTGCTCTTGATCCCCTACCAACGGAGCATCGCGCTGAGCCTGGTCCGCTCGGCCAGCCTCGTGTTCCAGGGCGTCTGGTTCACCGTCATGGGCGTCATGATGTGGACGCCGGCGCTCGTCCCCAAAGGCTGCTTCATGAACGACGAAGATGGCCTCCAAGTCGTCCGGTGCCGCACCGACGAGGCGCTCGACCGCGCCAAGTCGCTCGTCAACCTGCAGTTCAACTGGTACCTGACCGGCACCGTGGCGTTCGTCGTCGTGTTCTACCTCCAGATGGCCAAGCAGTACCAGGAGCAGCCGCAGTACGCTCCGCTGGTGAAGGGAGGGAGAGGCAGCGATGGCCGGTGCACCATCGGAGAGGTCAATGACGACGAGGATGACCTTGAGGCCTCCAAAGGAGGCTTAGGATATATCGAAATTGAGAGG ->species2_10 -ATGGGCACTCTCGTCGGGCACGTCGCGCCGGGCGCCGGCTTCCTCCTCATCGGCCTGTGGCAGCTATTCAGCCACATCCGCCTGTTCCTGCTGCGCCCGAGCTCGTACTCTGCTCCGGTCTGGTTCCCGGCGCCGGGCGTGCGCCACCTCGAGCTCATACTCATCATCATCGGCGCGGCGATGTCCATCCTGATGGAGCTCGTCATCGGCCCCGCGAAGCACCAGCCGTTCGACGACGACGGCACCATCCCGTCAGACCACCTCCACAACTTCGAGCACGCGTCCATCTCGCTGGCGCTGCTCGTCTTCGCCGCGGTCACCATCCACCTCGACAGGGTAAAGGCGCCCCTGCGTGACGCCGTGTCGCAGCTCGTCGCCGCCGCGGCGTTCGCGCAGCAGCTGCTCATCTTCCACCTCCACTCGGCGGACCACATGGGCGTGGAGGGGCAGTTCCACTGGCTGCTGCAGACGGTCATCGCCGTCACGCTCGCCACCACGCTGCTCGGGATCCCTTACCCGCGGAGCATCGTGGTGAGCCTTGTCCGGTCGGCCAGCCTCGTGCTCCAGGGCGTCTGGTTCGTCGTCATGGGCGTCATGCTGTGGACGCCGGCGCTCATACCCAAGGGCTGCTTCCTCAACCTCGAGGAAGGGCACGACGTCGTCCGGTGCCGCACCGACGAGGCGCTCGACCGCGCCAAGTCGCTCGTCAACCTGCAGTTCAGCTGGTACCTCACCGGCACGGTGGTGTTCGTCGTCCTGTTCTACCTCCAGATGGCGAAGCTCTACCCCGAGGAGCCGCAGTATTTGCCGCTGGTGAAGGGAGGAGGCGGCGGCGGCGATGACCGCGATAGCCGGTTCAGCATCGGAGACGATGATCACGACGATGAGGACGATGTCGAGGCTGCAAAACGTGGCTTCGGACACGTGGTTAGCGGCACAAAGCCTGTCGAAATCGAGAGG