Mercurial > repos > eschen42 > w4mkmeans
changeset 1:45efa59a7269 draft
planemo upload for repository https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper/tree/master commit 107d48fe5f1f6fe30d9f08a9bfb2fd802b85c610
author | eschen42 |
---|---|
date | Tue, 08 Aug 2017 12:31:55 -0400 |
parents | 330ee1d840db |
children | bd340dffd887 |
files | w4mkmeans.xml |
diffstat | 1 files changed, 52 insertions(+), 53 deletions(-) [+] |
line wrap: on
line diff
--- a/w4mkmeans.xml Tue Aug 08 11:23:42 2017 -0400 +++ b/w4mkmeans.xml Tue Aug 08 12:31:55 2017 -0400 @@ -2,8 +2,8 @@ <description>Calculate K-means for dataMatrix features or samples</description> <requirements> - <requirement type="package">r-base</requirement> - <requirement type="package">r-batch</requirement> + <requirement type="package" version="3.3.2">r-base</requirement> + <requirement type="package" version="1.1_4">r-batch</requirement> </requirements> <stdio> @@ -33,11 +33,11 @@ <param name="dataMatrix_in" label="Data matrix file" type="data" format="tabular" help="variable x sample, decimal: '.', missing: NA, mode: numerical, separator: tab" /> <param name="sampleMetadata_in" label="Sample metadata file" type="data" format="tabular" help="sample x metadata columns, separator: tab" /> <param name="variableMetadata_in" label="Variable metadata file" type="data" format="tabular" help="variable x metadata columns, separator: tab" /> - <param name="ksamples" label="K value(s) for samples" type="text" value = "0" help="Single K or comma-separated Ks for samples, or 0 for none." /> - <param name="kfeatures" label="K value(s) for features" type="text" value = "0" help="Single K or comma-separated Ks for features (variables), or 0 for none." /> - <param name="iter_max" label="Max number of iterations" type="text" value = "10" help="The maximum number of iterations allowed; default 10." /> - <param name="nstart" label="Number of random sets" type="text" value = "1" help="How many random sets should be chosen; default 1." /> - <param name="algorithm" label="Algorithm for clustering" type="select" value = "Hartigan-Wong" help="K-means clustering algorithm, default 'Hartigan-Wong'; alternatives 'Lloyd', 'MacQueen'; 'Forgy' is a synonym for 'Lloyd', see stats::kmeans reference for further info and references."> + <param name="ksamples" label="K value(s) for samples" type="text" value = "0" help="[ksamples] Single K or comma-separated Ks for samples, or 0 for none." /> + <param name="kfeatures" label="K value(s) for features" type="text" value = "0" help="[kfeatures] Single K or comma-separated Ks for features (variables), or 0 for none." /> + <param name="iter_max" label="Max number of iterations" type="text" value = "10" help="[iter_max] The maximum number of iterations allowed; default 10." /> + <param name="nstart" label="Number of random sets" type="text" value = "1" help="[nstart] How many random sets should be chosen; default 1." /> + <param name="algorithm" label="Algorithm for clustering" type="select" value = "Hartigan-Wong" help="[algorithm] K-means clustering algorithm, default 'Hartigan-Wong'; alternatives 'Lloyd', 'MacQueen'; 'Forgy' is a synonym for 'Lloyd', see references for further info."> <option value="Forgy">Forgy</option> <option value="Hartigan-Wong" selected="True">Hartigan-Wong</option> <option value="Lloyd">Lloyd</option> @@ -82,9 +82,9 @@ --------------------------------------------------------------------------- -** Source ** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper. +**Source** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper -** R code used ** - The R code invoked by this wrapper is the R 'stats::kmeans' package documented at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html +**R code used** - The R code invoked by this wrapper is the R 'stats::kmeans' package ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- @@ -103,14 +103,16 @@ Description ----------- -Calculdate K-means for features clusters (or samples, or both for W4M dataMatrix (i.e., XCMS-preprocessed data files as input). +Calculate K-means for sample-clusters (or feature-clusters, or both) using W4M dataMatrix (i.e., XCMS-preprocessed data files) as input. +*Please note that XCMS refers to features as 'variables'. This documentation does not use either term consistently.* ----------------- Workflow Position ----------------- + - Tool category: Statistical Analysis - Upstream tool category: Preprocessing - Downstream tool categories: Statistical Analysis @@ -119,10 +121,10 @@ Motivation ---------- -This tool clusters features (variables), samples, or both from the W4M dataMatrix and writes the results to new columns in variableMetadata, sampleMetadata, or both, respectively. -If a range of K is supplied, then one column is added for each member within the range. -Note that this clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster. +This tool clusters samples, features (variables), or both from the W4M dataMatrix and writes the results to new columns in sampleMetadata, variableMetadata, or both, respectively. + - If several, comma-separated K's are supplied, then one column is added for each K. + - This clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster. - For feature-clustering, each feature is assigned to a cluster such that the feature's response for all samples is closer to the mean of all features for that cluster than to the mean for any other cluster. - For sample-clustering, each sample is assigned to a cluster such that the sample's response for all features is closer to the mean of all samples for that cluster than to the mean for any other cluster. @@ -131,15 +133,15 @@ Input files ----------- -+---------------------------+------------+ -| File | Format | -+===========================+============+ -| Data matrix | tabular | -+---------------------------+------------+ -| Sample metadata | tabular | -+---------------------------+------------+ -| Variable metadata | tabular | -+---------------------------+------------+ ++--------------------------------------------+------------+ +| File | Format | ++============================================+============+ +| Data matrix | tabular | ++--------------------------------------------+------------+ +| Sample metadata | tabular | ++--------------------------------------------+------------+ +| Variable (i.e., feature) metadata | tabular | ++--------------------------------------------+------------+ ---------- @@ -147,55 +149,52 @@ ---------- **Data matrix** - input-file dataset - | variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below) - | + + - XCMS variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below) **Sample metadata** - input-file dataset - | sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values - | + + - XCMS sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values **Feature metadata** - input-file dataset - | variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values - | + + - XCMS variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values **kfeatures** - K or K's for features (default = 0) - | integer or comma-separated integers ; zero (the default) or less will result in no calculation. - | + + - integer or comma-separated integers ; zero (the default) or less will result in no calculation. **ksamples** - K or K-range for samples (default = 0) - | integer or comma-separated integers ; zero (the default) or less will result in no calculation. - | + + - integer or comma-separated integers ; zero (the default) or less will result in no calculation. **iter_max** - maximum_iterations (default = 10) - | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html). - | + + - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html). **nstart** - how many random sets should be chosen (default = 1) - | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html). - | + + - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html). ------------ Output files ------------ -sampleMetadata - | (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K - | 'kmscN' - cluster number for clustering samples with K = N - | +**XCMS sampleMetadata** - (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K + + - **k#** - cluster number for clustering samples with K = # + +**XCMS variableMetadata** - (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K + + - **k#** - cluster number for clustering features with K = # -variableMetadata - | (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K - | 'kmfcN' - cluster number for clustering features with K = N - | +**scores** - (tabular separated values) file with one line for each K. -scores - | (tabular separated values) file with one line for each K. - | 'clusterOn' - what was clustered - either 'sample' or 'feature' - | 'k' - the chosen K for clustering - | 'totalSS' - total ('between' plus total of 'within') sum of squares - | 'betweenSS' - 'between' sum of squares - | 'proportion' - betweenSS / totalSS - | + - **clusterOn** - what was clustered - either 'sample' or 'feature' + - **k** - the chosen K for clustering + - **totalSS** - total (*between-treatements* plus total of *within-treatements*) sum of squares + - **betweenSS** - *between-treatements* sum of squares + - **proportion** - betweenSS / totalSS --------------- Working example @@ -210,7 +209,7 @@ +-------------------+-------------------------------------------------------------------------------------------------------------------+ | Sample metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_sampleMetadata.tsv | +-------------------+-------------------------------------------------------------------------------------------------------------------+ -| Variable metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv | +| Feature metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv | +-------------------+-------------------------------------------------------------------------------------------------------------------+ **Other input parameters**