changeset 12:7945134d3956 draft

Uploaded
author greg
date Wed, 01 Mar 2017 15:01:15 -0500
parents a0b1f599becc
children 09ef5cc67c78
files kaks_analysis.xml
diffstat 1 files changed, 13 insertions(+), 23 deletions(-) [+]
line wrap: on
line diff
--- a/kaks_analysis.xml	Wed Mar 01 14:17:43 2017 -0500
+++ b/kaks_analysis.xml	Wed Mar 01 15:01:15 2017 -0500
@@ -184,31 +184,20 @@
 
  * **Required options**
 
-  - **Select gene family clusters** - Sequences classified into gene family clusters, optionally including corresponding coding sequences.
-  - **Orthogroups or gene families proteins scaffold** - PlantTribes scaffolds data.
-  - **Protein clustering method** - One of GFam (domain architecture based clustering), OrthoFinder (broadly defined clusters) or OrthoMCL (narrowly defined clusters).  
-
- * **Multiple sequence alignments options**
+  - **Coding sequences (CDS) fasta file for the species** - Coding sequences (CDS) fasta file for the first species.
+  - **Aamino acids (proteins) sequences fasta file for the species** - Aamino acids (proteins) sequences fasta file for the first species
+  - **Select method for pairwise sequence comparison to determine homolgous pairs** - Pairwise sequence comparison to determine homolgous pairs (cross species comparison requires selection of inputs for second species).
+  - **Orthogroups or gene families proteins scaffold** - PlantTribes scaffolds data installed into Galaxy by the PlantTribes Scaffolds Download Data Manager tool.
 
-  - **Select method for multiple sequence alignments** - Method used for setting multiple sequence alignments.
-  - **Input sequences include corresponding coding sequences?** - Selecting 'Yes' for this option requires that the selected input data format is 'ptorthocs'.
-  - **Construct orthogroup multiple codon alignments?** - Construct orthogroup multiple codon alignments.
-  - **Sequence type used in the phylogenetic inference** - Sequence type (dna or amino acid) used in the phylogenetic inference.
-  - **Use corresponding coding sequences?** - Selecting 'Yes' for this option requires that the selected input data format is 'ptorthocs' or this tool will produce an error.
+ * **Other (optional) options**
 
- * **Phylogenetic trees options**
-
-  - **Phylogenetic trees inference method** - Phylogenetic trees inference method.
-  - **Select rooting order configuration for rooting trees??** - If 'No' is selected, trees will be rooted using the most distant taxon present in the orthogroup.
-  - **Number of replicates for rapid bootstrap analysis and search for the best-scoring ML tree** - Number of replicates for rapid bootstrap analysis and search for the best-scoring ML tree.
-  - **Maximum number of sequences in orthogroup alignments** - Maximum number of sequences in orthogroup alignments.
-  - **Minimum number of sequences in orthogroup alignments** - Minimum number of sequences in orthogroup alignments.
-
- * **MSA quality control options**
-
-  - **Remove sequences with gaps of** - Removes gappy sequences in alignments (i.e., 0.5 removes sequences with 50% gaps).
-  - **Select process used for gap trimming** - Either nucleotide based trimming or alignments are trimed using using trimAl's ML heuristic trimming approach.
-  - **Remove sites in alignments with gaps of** - If the process used for gap trimming is nucleotide based, this is the gap value used when removing gappy sites in alignments (i.e., 0.1 removes sites with 90% gaps).
+  - **Minimum sequence pairwise coverage length between homologous pairs** - Minimum sequence pairwise coverage length between homologous pairs (e.g., 0.5 results in 50% coverage.  Legal values lie between 0.3 and 1.0.
+  - **Evolutionary rate for recalibrating synonymous subsitutions (ks) of species** - (applies to paralogous ks analysis) Recalibrate synonymous subsitutions (ks) of species using a predetermined evoutionary rate that can be determined from a species tree inferred from a collection single copy genes from taxa of interest (Cui et al., 2006).
+  - **Select PAML codeml control file?** - Select PAML's codeml control file from your history.  This file is used to to perfom ML analysis of protein-coding DNA sequences using codon substitution models.  Selecting No uses the default file which does not include input (seqfile, treefile) and output (outfile) parameters of codeml.
+  - **Fit a mixture model of multivariate normal components to synonymous (ks) distribution?** - Fit a mixture model of multivariate normal components to synonymous (ks) distribution to identify significant duplication event(s) in a genome.
+  - **Number components to fit to synonymous subsitutions (ks) distribution** - Number components to fit to synonymous subsitutions (ks) distribution.
+  - **Lower limit of synonymous subsitutions (ks)** - Lower limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to reduce background noise from young paralogous pairs due to normal gene births and deaths in a genome.
+  - **Upper limit of synonymous subsitutions (ks)** - Upper limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to exclude likey ancient paralogous pairs.
 
     </help>
     <citations>
@@ -220,6 +209,7 @@
             url = {https://github.com/dePamphilis/PlantTribes}
             }
         </citation>
+        <citation type="doi">10.1093/bioinformatics/btw412</citation>
         <citation type="doi">10.1186/1471-2105-10-421</citation>
         <citation type="doi">10.1093/molbev/msm088</citation>
         <citation type="doi">10.18637/jss.v004.i02</citation>