Mercurial > repos > greg > gene_family_classifier
changeset 130:fb3feee2638d draft
Uploaded
author | greg |
---|---|
date | Wed, 22 Mar 2017 08:45:31 -0400 |
parents | 334b95417e14 |
children | 656614635ebf |
files | gene_family_classifier.xml |
diffstat | 1 files changed, 28 insertions(+), 17 deletions(-) [+] |
line wrap: on
line diff
--- a/gene_family_classifier.xml Tue Mar 21 11:52:36 2017 -0400 +++ b/gene_family_classifier.xml Wed Mar 22 08:45:31 2017 -0400 @@ -124,8 +124,8 @@ <conditional name="save_hmmscan_log_cond"> <param name="classifier" type="select" label="Protein classifier"> <option value="blastp" selected="true">blastp</option> - <option value="hmmscan">HMMScan</option> - <option value="both">Both blastp and HMMScan</option> + <option value="hmmscan">hmmscan</option> + <option value="both">Both blastp and hmmscan</option> </param> <when value="blastp" /> <when value="hmmscan"> @@ -156,7 +156,8 @@ <when value="no"/> <when value="yes"> <param name="super_orthogroups" type="select" label="Clustering distance measure"> - <option value="min_evalue" selected="true">blastp e-value</option> + <option value="min_evalue" selected="true">minimum e-value</option> + <option value="avg_evalue">average e-value</option> </param> </when> </conditional> @@ -169,8 +170,8 @@ <when value="yes"> <conditional name="single_copy_cond"> <param name="single_copy" type="select" label="Selection criterion"> - <option value="custom" selected="true">Custom selection</option> - <option value="taxa">Global selection</option> + <option value="taxa" selected="true">Global selection</option> + <option value="custom">Custom selection</option> </param> <when value="custom"> <conditional name="single_copy_custom_cond"> @@ -185,8 +186,8 @@ </conditional> </when> <when value="taxa"> - <param name="single_copy_taxa" type="integer" value="0" label="Minimum single copy taxa" help="Zero values have no affect"/> - <param name="taxa_present" type="integer" value="0" label="Minimum taxa present" help="Zero values have no affect"/> + <param name="single_copy_taxa" type="integer" value="0" min="0" label="Minimum single copy taxa" help="Zero values have no affect"/> + <param name="taxa_present" type="integer" value="0" min="0" label="Minimum taxa present" help="Zero values have no affect"/> </when> </conditional> </when> @@ -265,27 +266,27 @@ * **Proteins fasta file** - proteins fasta file either produced by the AssemblyPostProcessor tool or an external source selected from your history. * **Gene family scaffold** - one of the PlantTribes gene family scaffolds [2-4] installed into Galaxy by the PlantTribes Scaffold Data Manager tool. * **Protein clustering method** - gene family scaffold protein clustering method as described in the AssemblyPostProcessor tool. - * **Protein classifier** - classifier to assign protein sequences into a specified scaffold orthogroups. PlantTribes implements three classification approaches; blastp (faster)[5], hmmscan (slower but more sensitive to the remote homologs)[6], and both blastp and hmmscan (more exhaustive). + * **Protein classifier** - classifier to assign protein sequences into a specified scaffold orthogroups. PlantTribes implements three classification approaches; blastp (faster)[5], hmmscan (slower but more sensitive assignment of divergent homologs)[6], and both blastp and hmmscan (disagreements resolved in favor of hmmscan; more exhaustive). **Other options** - * **Super orthogroups configuration** - select ‘Yes’ to enable super orthogroups configuration options. Super orthogroups are constructed through a second iteration of MCL clustering to connect distant, but potentially related orthogroup clusters. + * **Super orthogroups configuration** - select ‘Yes’ to enable super orthogroups configuration options. Super orthogroups[7] are constructed through a second iteration of MCL clustering to connect distant, but potentially related orthogroup clusters. - * **Clustering distance measure** - distance measure used in merging orthogroup clusters into super orthogroup clusters. PlantTribes pre-computed super orthogroups are based on the minimum and average blastp e-value between all pairs of scaffold orthogroups used as the input matrix for MCL clustering algorithm[7]. + * **Clustering distance measure** - distance measure used in merging orthogroup clusters into super orthogroup clusters. PlantTribes pre-computed super orthogroups are based on the minimum and average blastp e-value between all pairs of scaffold orthogroups used as the input matrix for MCL clustering algorithm[8]. * **Single copy orthogroups configuration** - select ‘Yes’ to enable single/low-copy orthogroups selection configuration options. - * **Selection criterion** - single/low-copy orthogroups selection criterion. PlantTribes provides custom and global selection criteria for selecting user defined single/low-copy scaffold orthogoups. + * **Selection criterion** - single/low-copy orthogroups selection criterion. PlantTribes provides custom and global selection criteria for selecting user-defined single/low-copy scaffold orthogoups. + + * **Global selection configuration** - the upper limit values of the following two parameters vary depending on the selected gene family scaffold, and the tool will produce an error if the value exceeds the number of species in the circumscribed scaffold. Zero values have no affect. + + * **Minimum single copy taxa** - minimum number of taxa with single copy genes in the orthogroup. + * **Minimum taxa present** - minimum number of taxa present in the orthogroup. * **Custom selection configuration** - select ‘Yes’ to enable selection of a single copy configuration file. Scaffold configuration templates (.singleCopy.config) of how to customize single/low-copy orthogroups selection can be found in the scaffold data installed into Galaxy via the PlantTribes Scaffolds Download Data Manager tool, and also available at the PlantTribes GitHub repository (https://github.com/dePamphilis/PlantTribes/config ). Single/low-copy settings shown in these templates are used as defaults if ‘No’ is selected. * **Custom selection file** - select a single/low-copy customized configuration file from your history. - * **Global selection configuration** - the upper limit values of the folowing 2 parameters vary depending on the selected gene family scaffold, and the tool will produce an error if the value exceeds the number of species in the circumscribed scaffold. Zero values have no affect. - - * **Minimum single copy taxa** - minimum number of taxa with single copy genes in the orthogroup. - * **Minimum taxa present** - minimum number of taxa present in the orthogroup. - * **Orthogroups fasta configuration** - select ‘Yes’ to create proteins orthogroups fasta files for the classified sequences. * **Orthogroups coding sequences** - select ‘Yes’ to create corresponding coding sequences orthogroups fasta files for the classified protein sequences. Requires coding sequences fasta file corresponding to the proteins fasta file to be selected from your history. @@ -351,9 +352,19 @@ pages = {205-211},} </citation> <citation type="bibtex"> + @article{Wall2008, + journal = {Nucleic Acids Research}, + author = {7. Wall PK, Leebens-Mack J, Muller KF, Field D, Altman NS}, + title = {PlantTribes: a gene and gene family resource for comparative genomics in plants}, + year = {2008}, + volume = {36}, + number = {suppl 1}, + pages = {D970-D976},} + </citation> + <citation type="bibtex"> @article{Enright2002, journal = {Nucleic acids research}, - author = {7. Enright AJ, Van Dongen S, Ouzounis CA}, + author = {8. Enright AJ, Van Dongen S, Ouzounis CA}, title = {n efficient algorithm for large-scale detection of protein families}, year = {2002}, volume = {30},