# HG changeset patch
# User eschen42
# Date 1502209915 14400
# Node ID 45efa59a7269d7bfa36c9a747693178e78574aef
# Parent 330ee1d840db257b1f815946b21b6ad2e410893a
planemo upload for repository https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper/tree/master commit 107d48fe5f1f6fe30d9f08a9bfb2fd802b85c610
diff -r 330ee1d840db -r 45efa59a7269 w4mkmeans.xml
--- a/w4mkmeans.xml Tue Aug 08 11:23:42 2017 -0400
+++ b/w4mkmeans.xml Tue Aug 08 12:31:55 2017 -0400
@@ -2,8 +2,8 @@
Calculate K-means for dataMatrix features or samples
- r-base
- r-batch
+ r-base
+ r-batch
@@ -33,11 +33,11 @@
-
-
-
-
-
+
+
+
+
+
@@ -82,9 +82,9 @@
---------------------------------------------------------------------------
-** Source ** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper.
+**Source** - The source code for the w4mkmeans tool is available (from the Hegeman lab github repository) at https://github.com/HegemanLab/w4mkmeans_galaxy_wrapper
-** R code used ** - The R code invoked by this wrapper is the R 'stats::kmeans' package documented at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
+**R code used** - The R code invoked by this wrapper is the R 'stats::kmeans' package
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -103,14 +103,16 @@
Description
-----------
-Calculdate K-means for features clusters (or samples, or both for W4M dataMatrix (i.e., XCMS-preprocessed data files as input).
+Calculate K-means for sample-clusters (or feature-clusters, or both) using W4M dataMatrix (i.e., XCMS-preprocessed data files) as input.
+*Please note that XCMS refers to features as 'variables'. This documentation does not use either term consistently.*
-----------------
Workflow Position
-----------------
+ - Tool category: Statistical Analysis
- Upstream tool category: Preprocessing
- Downstream tool categories: Statistical Analysis
@@ -119,10 +121,10 @@
Motivation
----------
-This tool clusters features (variables), samples, or both from the W4M dataMatrix and writes the results to new columns in variableMetadata, sampleMetadata, or both, respectively.
-If a range of K is supplied, then one column is added for each member within the range.
-Note that this clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster.
+This tool clusters samples, features (variables), or both from the W4M dataMatrix and writes the results to new columns in sampleMetadata, variableMetadata, or both, respectively.
+ - If several, comma-separated K's are supplied, then one column is added for each K.
+ - This clustering is **not** hierarchical; each member of a cluster is not a member of any other cluster.
- For feature-clustering, each feature is assigned to a cluster such that the feature's response for all samples is closer to the mean of all features for that cluster than to the mean for any other cluster.
- For sample-clustering, each sample is assigned to a cluster such that the sample's response for all features is closer to the mean of all samples for that cluster than to the mean for any other cluster.
@@ -131,15 +133,15 @@
Input files
-----------
-+---------------------------+------------+
-| File | Format |
-+===========================+============+
-| Data matrix | tabular |
-+---------------------------+------------+
-| Sample metadata | tabular |
-+---------------------------+------------+
-| Variable metadata | tabular |
-+---------------------------+------------+
++--------------------------------------------+------------+
+| File | Format |
++============================================+============+
+| Data matrix | tabular |
++--------------------------------------------+------------+
+| Sample metadata | tabular |
++--------------------------------------------+------------+
+| Variable (i.e., feature) metadata | tabular |
++--------------------------------------------+------------+
----------
@@ -147,55 +149,52 @@
----------
**Data matrix** - input-file dataset
- | variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below)
- |
+
+ - XCMS variable x sample 'dataMatrix' (tabular separated values) file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and feature metadata, respectively (see below)
**Sample metadata** - input-file dataset
- | sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values
- |
+
+ - XCMS sample x metadata 'sampleMetadata' (tabular separated values) file of the numeric and/or character sample metadata, with . as decimal and NA for missing values
**Feature metadata** - input-file dataset
- | variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values
- |
+
+ - XCMS variable x metadata 'variableMetadata' (tabular separated values) file of the numeric and/or character feature metadata, with . as decimal and NA for missing values
**kfeatures** - K or K's for features (default = 0)
- | integer or comma-separated integers ; zero (the default) or less will result in no calculation.
- |
+
+ - integer or comma-separated integers ; zero (the default) or less will result in no calculation.
**ksamples** - K or K-range for samples (default = 0)
- | integer or comma-separated integers ; zero (the default) or less will result in no calculation.
- |
+
+ - integer or comma-separated integers ; zero (the default) or less will result in no calculation.
**iter_max** - maximum_iterations (default = 10)
- | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
- |
+
+ - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
**nstart** - how many random sets should be chosen (default = 1)
- | maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
- |
+
+ - maximum number of iterations per calculation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html).
------------
Output files
------------
-sampleMetadata
- | (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K
- | 'kmscN' - cluster number for clustering samples with K = N
- |
+**XCMS sampleMetadata** - (tabular separated values) file identical to the Sample metadata file given as an input argument, excepting one column added for each K
+
+ - **k#** - cluster number for clustering samples with K = #
+
+**XCMS variableMetadata** - (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K
+
+ - **k#** - cluster number for clustering features with K = #
-variableMetadata
- | (tabular separated values) file identical to the Feature metadata file given as an input argument, excepting one column added for each K
- | 'kmfcN' - cluster number for clustering features with K = N
- |
+**scores** - (tabular separated values) file with one line for each K.
-scores
- | (tabular separated values) file with one line for each K.
- | 'clusterOn' - what was clustered - either 'sample' or 'feature'
- | 'k' - the chosen K for clustering
- | 'totalSS' - total ('between' plus total of 'within') sum of squares
- | 'betweenSS' - 'between' sum of squares
- | 'proportion' - betweenSS / totalSS
- |
+ - **clusterOn** - what was clustered - either 'sample' or 'feature'
+ - **k** - the chosen K for clustering
+ - **totalSS** - total (*between-treatements* plus total of *within-treatements*) sum of squares
+ - **betweenSS** - *between-treatements* sum of squares
+ - **proportion** - betweenSS / totalSS
---------------
Working example
@@ -210,7 +209,7 @@
+-------------------+-------------------------------------------------------------------------------------------------------------------+
| Sample metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_sampleMetadata.tsv |
+-------------------+-------------------------------------------------------------------------------------------------------------------+
-| Variable metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv |
+| Feature metadata | https://raw.githubusercontent.com/HegemanLab/w4mkmeans_galaxy_wrapper/master/test-data/input_variableMetadata.tsv |
+-------------------+-------------------------------------------------------------------------------------------------------------------+
**Other input parameters**