# HG changeset patch # User greg # Date 1488398475 18000 # Node ID 7945134d3956f713ec1b67a791c31295e7788248 # Parent a0b1f599becc12ac82374043ece9e7f4408c53e9 Uploaded diff -r a0b1f599becc -r 7945134d3956 kaks_analysis.xml --- a/kaks_analysis.xml Wed Mar 01 14:17:43 2017 -0500 +++ b/kaks_analysis.xml Wed Mar 01 15:01:15 2017 -0500 @@ -184,31 +184,20 @@ * **Required options** - - **Select gene family clusters** - Sequences classified into gene family clusters, optionally including corresponding coding sequences. - - **Orthogroups or gene families proteins scaffold** - PlantTribes scaffolds data. - - **Protein clustering method** - One of GFam (domain architecture based clustering), OrthoFinder (broadly defined clusters) or OrthoMCL (narrowly defined clusters). - - * **Multiple sequence alignments options** + - **Coding sequences (CDS) fasta file for the species** - Coding sequences (CDS) fasta file for the first species. + - **Aamino acids (proteins) sequences fasta file for the species** - Aamino acids (proteins) sequences fasta file for the first species + - **Select method for pairwise sequence comparison to determine homolgous pairs** - Pairwise sequence comparison to determine homolgous pairs (cross species comparison requires selection of inputs for second species). + - **Orthogroups or gene families proteins scaffold** - PlantTribes scaffolds data installed into Galaxy by the PlantTribes Scaffolds Download Data Manager tool. - - **Select method for multiple sequence alignments** - Method used for setting multiple sequence alignments. - - **Input sequences include corresponding coding sequences?** - Selecting 'Yes' for this option requires that the selected input data format is 'ptorthocs'. - - **Construct orthogroup multiple codon alignments?** - Construct orthogroup multiple codon alignments. - - **Sequence type used in the phylogenetic inference** - Sequence type (dna or amino acid) used in the phylogenetic inference. - - **Use corresponding coding sequences?** - Selecting 'Yes' for this option requires that the selected input data format is 'ptorthocs' or this tool will produce an error. + * **Other (optional) options** - * **Phylogenetic trees options** - - - **Phylogenetic trees inference method** - Phylogenetic trees inference method. - - **Select rooting order configuration for rooting trees??** - If 'No' is selected, trees will be rooted using the most distant taxon present in the orthogroup. - - **Number of replicates for rapid bootstrap analysis and search for the best-scoring ML tree** - Number of replicates for rapid bootstrap analysis and search for the best-scoring ML tree. - - **Maximum number of sequences in orthogroup alignments** - Maximum number of sequences in orthogroup alignments. - - **Minimum number of sequences in orthogroup alignments** - Minimum number of sequences in orthogroup alignments. - - * **MSA quality control options** - - - **Remove sequences with gaps of** - Removes gappy sequences in alignments (i.e., 0.5 removes sequences with 50% gaps). - - **Select process used for gap trimming** - Either nucleotide based trimming or alignments are trimed using using trimAl's ML heuristic trimming approach. - - **Remove sites in alignments with gaps of** - If the process used for gap trimming is nucleotide based, this is the gap value used when removing gappy sites in alignments (i.e., 0.1 removes sites with 90% gaps). + - **Minimum sequence pairwise coverage length between homologous pairs** - Minimum sequence pairwise coverage length between homologous pairs (e.g., 0.5 results in 50% coverage. Legal values lie between 0.3 and 1.0. + - **Evolutionary rate for recalibrating synonymous subsitutions (ks) of species** - (applies to paralogous ks analysis) Recalibrate synonymous subsitutions (ks) of species using a predetermined evoutionary rate that can be determined from a species tree inferred from a collection single copy genes from taxa of interest (Cui et al., 2006). + - **Select PAML codeml control file?** - Select PAML's codeml control file from your history. This file is used to to perfom ML analysis of protein-coding DNA sequences using codon substitution models. Selecting No uses the default file which does not include input (seqfile, treefile) and output (outfile) parameters of codeml. + - **Fit a mixture model of multivariate normal components to synonymous (ks) distribution?** - Fit a mixture model of multivariate normal components to synonymous (ks) distribution to identify significant duplication event(s) in a genome. + - **Number components to fit to synonymous subsitutions (ks) distribution** - Number components to fit to synonymous subsitutions (ks) distribution. + - **Lower limit of synonymous subsitutions (ks)** - Lower limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to reduce background noise from young paralogous pairs due to normal gene births and deaths in a genome. + - **Upper limit of synonymous subsitutions (ks)** - Upper limit of synonymous subsitutions (ks) - necessary if fitting components to the distribution to exclude likey ancient paralogous pairs. @@ -220,6 +209,7 @@ url = {https://github.com/dePamphilis/PlantTribes} } + 10.1093/bioinformatics/btw412 10.1186/1471-2105-10-421 10.1093/molbev/msm088 10.18637/jss.v004.i02