Mercurial > repos > eschen42 > w4mkmeans

--- a/w4mkmeans.xml	Tue Aug 08 11:23:42 2017 -0400
+++ b/w4mkmeans.xml	Tue Aug 08 12:31:55 2017 -0400
@@ -2,8 +2,8 @@
   <description>Calculate K-means for dataMatrix features or samples</description>

   <requirements>
-    <requirement type="package">r-base</requirement>
-    <requirement type="package">r-batch</requirement>
+    <requirement type="package" version="3.3.2">r-base</requirement>
+    <requirement type="package" version="1.1_4">r-batch</requirement>
   </requirements>

   <stdio>
@@ -33,11 +33,11 @@
     <param name="dataMatrix_in" label="Data matrix file" type="data" format="tabular" help="variable x sample, decimal: '.', missing: NA, mode: numerical, separator: tab" />
     <param name="sampleMetadata_in" label="Sample metadata file" type="data" format="tabular" help="sample x metadata columns, separator: tab" />
     <param name="variableMetadata_in" label="Variable metadata file" type="data" format="tabular" help="variable x metadata columns, separator: tab" />
-    <param name="ksamples" label="K value(s) for samples" type="text" value = "0" help="Single K or comma-separated Ks for samples, or 0 for none." />
-    <param name="kfeatures" label="K value(s) for features" type="text" value = "0" help="Single K or comma-separated Ks for features (variables), or 0 for none." />
-    <param name="iter_max" label="Max number of iterations" type="text" value = "10" help="The maximum number of iterations allowed; default 10." />
-    <param name="nstart" label="Number of random sets" type="text" value = "1" help="How many random sets should be chosen; default 1." />
-    <param name="algorithm" label="Algorithm for clustering" type="select" value = "Hartigan-Wong" help="K-means clustering algorithm, default 'Hartigan-Wong'; alternatives 'Lloyd', 'MacQueen'; 'Forgy' is a synonym for 'Lloyd', see stats::kmeans reference for further info and references.">
+    <param name="ksamples" label="K value(s) for samples" type="text" value = "0" help="[ksamples] Single K or comma-separated Ks for samples, or 0 for none." />
+    <param name="kfeatures" label="K value(s) for features" type="text" value = "0" help="[kfeatures] Single K or comma-separated Ks for features (variables), or 0 for none." />
+    <param name="iter_max" label="Max number of iterations" type="text" value = "10" help="[iter_max] The maximum number of iterations allowed; default 10." />
+    <param name="nstart" label="Number of random sets" type="text" value = "1" help="[nstart] How many random sets should be chosen; default 1." />
+    <param name="algorithm" label="Algorithm for clustering" type="select" value = "Hartigan-Wong" help="[algorithm] K-means clustering algorithm, default 'Hartigan-Wong'; alternatives 'Lloyd', 'MacQueen'; 'Forgy' is a synonym for 'Lloyd', see references for further info.">
       <option value="Forgy">Forgy</option>
       <option value="Hartigan-Wong" selected="True">Hartigan-Wong</option>
       <option value="Lloyd">Lloyd</option>
@@ -82,9 +82,9 @@
 ---------------------------------------------------------------------------


-** Source ** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper.
+**Source** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper

-** R code used ** - The R code invoked by this wrapper is the R 'stats::kmeans' package documented at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
+**R code used** - The R code invoked by this wrapper is the R 'stats::kmeans' package

 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------

@@ -103,14 +103,16 @@
 Description
 -----------

-Calculdate K-means for features clusters (or samples, or both for W4M dataMatrix (i.e., XCMS-preprocessed data files as input).
+Calculate K-means for sample-clusters (or feature-clusters, or both) using W4M dataMatrix (i.e., XCMS-preprocessed data files) as input.

+*Please note that XCMS refers to features as 'variables'.  This documentation does not use either term consistently.*


 -----------------
 Workflow Position
 -----------------

+  - Tool category: Statistical Analysis
   - Upstream tool category: Preprocessing
   - Downstream tool categories: Statistical Analysis

@@ -119,10 +121,10 @@
 Motivation
 ----------

-This tool clusters features (variables), samples, or both from the W4M dataMatrix and writes the results to new columns in variableMetadata, sampleMetadata, or both, respectively.
-If a range of K is supplied, then one column is added for each member within the range.
-Note that this clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster.
+This tool clusters samples, features (variables), or both from the W4M dataMatrix and writes the results to new columns in sampleMetadata, variableMetadata, or both, respectively.

+  - If several, comma-separated K's are supplied, then one column is added for each K.
+  - This clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster.
   - For feature-clustering, each feature is assigned to a cluster such that the feature's response for all samples is closer to the mean of all features for that cluster than to the mean for any other cluster.
   - For sample-clustering, each sample is assigned to a cluster such that the sample's response for all features is closer to the mean of all samples for that cluster than to the mean for any other cluster.

@@ -131,15 +133,15 @@
 Input files
 -----------

-+---------------------------+------------+
-| File                      |   Format   |
-+===========================+============+
-|     Data matrix           |   tabular  |
-+---------------------------+------------+
-|     Sample metadata       |   tabular  |
-+---------------------------+------------+
-|     Variable metadata     |   tabular  |
-+---------------------------+------------+
++--------------------------------------------+------------+
+| File                                       |   Format   |
++============================================+============+
+|     Data matrix                            |   tabular  |
++--------------------------------------------+------------+
+|     Sample metadata                        |   tabular  |
++--------------------------------------------+------------+
+|     Variable (i.e., feature) metadata      |   tabular  |
++--------------------------------------------+------------+


 ----------
@@ -147,55 +149,52 @@
 ----------

 **Data matrix** - input-file dataset
-    | variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below)
-    |
+
+  - XCMS variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below)

 **Sample metadata** - input-file dataset
-    | sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values
-    |
+
+  - XCMS sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values

 **Feature metadata** - input-file dataset
-    | variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values
-    |
+
+  - XCMS variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values

 **kfeatures** - K or K's for features (default = 0)
-    | integer or comma-separated integers ; zero (the default) or less will result in no calculation.
-    |
+
+  - integer or comma-separated integers ; zero (the default) or less will result in no calculation.

 **ksamples** - K or K-range for samples (default = 0)
-    | integer or comma-separated integers ; zero (the default) or less will result in no calculation.
-    |
+
+  - integer or comma-separated integers ; zero (the default) or less will result in no calculation.

 **iter_max** - maximum_iterations (default = 10)
-    | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
-    |
+
+  - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).

 **nstart** - how many random sets should be chosen (default = 1)
-    | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
-    |
+
+  - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).

 ------------
 Output files
 ------------

-sampleMetadata
-    | (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K
-    | 'kmscN' - cluster number for clustering samples with K = N
-    |
+**XCMS sampleMetadata** - (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K
+
+  - **k#** - cluster number for clustering samples with K = #
+
+**XCMS variableMetadata** - (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K
+
+  - **k#** - cluster number for clustering features with K = #

-variableMetadata
-    | (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K
-    | 'kmfcN' - cluster number for clustering features with K = N
-    |
+**scores** - (tabular separated values) file with one line for each K.

-scores
-    | (tabular separated values) file with one line for each K.
-    | 'clusterOn' - what was clustered - either 'sample' or 'feature'
-    | 'k' - the chosen K for clustering
-    | 'totalSS' - total ('between' plus total of 'within') sum of squares
-    | 'betweenSS' - 'between' sum of squares
-    | 'proportion' - betweenSS / totalSS
-    |
+  - **clusterOn** - what was clustered - either 'sample' or 'feature'
+  - **k** - the chosen K for clustering
+  - **totalSS** - total (*between-treatements* plus total of *within-treatements*) sum of squares
+  - **betweenSS** - *between-treatements* sum of squares
+  - **proportion** - betweenSS / totalSS

 ---------------
 Working example
@@ -210,7 +209,7 @@
 +-------------------+-------------------------------------------------------------------------------------------------------------------+
 | Sample metadata   | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_sampleMetadata.tsv   |
 +-------------------+-------------------------------------------------------------------------------------------------------------------+
-| Variable metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv |
+| Feature metadata  | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv |
 +-------------------+-------------------------------------------------------------------------------------------------------------------+

 **Other input parameters**