# HG changeset patch # User recetox # Date 1644588878 0 # Node ID 10ded21d47c0effcdf222d6378b8fcca04d8547c # Parent 69e0da4703b587b944011cf11e22e158ab2ebd70 "planemo upload for repository https://github.com/RECETOX/galaxytools/tree/master/tools/ramclustr commit 3d2821ffc97cc4f9287ee83bbddb306a8034daa0" diff -r 69e0da4703b5 -r 10ded21d47c0 macros.xml --- a/macros.xml Fri Feb 04 08:31:26 2022 +0000 +++ b/macros.xml Fri Feb 11 14:14:38 2022 +0000 @@ -32,69 +32,39 @@

- - - - - - + +

- - - - + +

+ + +

- - - - - - - - - - - - - - - + + + + + + + + + - - + +

+ @@ -104,68 +74,82 @@ + + + +

- - - - - - - +

+ + + + + + + + + + + + +

- + + +

+ +

+ + + +

+ +

+ - - -

- - - -

- - -

- - - -

- - + set based on the detected signal intensities for that feature."/> +

- - - + + + not msp_output_details['merge_msp'] + + + msp_output_details['merge_msp'] + @@ -234,7 +218,22 @@ (2) feature names that contain the mass and retention times, separated by a constant delimiter; and (3) features in columns and samples in rows. + +----------------------+-------------------+-------------------+--------------------+--------------------+ + | sample | 100.88_262.464 | 100.01_423.699 | 100.003_128.313 | 100.0057_154.686 | + +======================+===================+===================+====================+====================+ + | 10_qc_16x_dil_milliq | 0 | 195953.6376 | 0 | 0 | + +----------------------+-------------------+-------------------+--------------------+--------------------+ + | 11_qc_8x_dil_milliq | 0 | 117742.1828 | 4247300.664 | 0 | + +----------------------+-------------------+-------------------+--------------------+--------------------+ + | 12_qc_32x_dil_milliq | 4470859.38 | 0 | 2206092.112 | 0 | + +----------------------+-------------------+-------------------+--------------------+--------------------+ + | 15_qc_16x_dil_milliq | 0 | 0 | 2767477.481 | 0 | + +----------------------+-------------------+-------------------+--------------------+--------------------+ + + Downstream Tools + The output is a msp file or a collection of msp files, with additional Spec Abundance file. + +---------+--------------+----------------------+ | Name | Output File | Format | +=========+==============+======================+ @@ -266,16 +265,16 @@ RAMClustR approach RAMClustR was designed to group features designed from the same compound using an approach which is - __1.__ unsupervised, __2.__ platform agnosic, and __3.__ devoid of curated rules, as the depth of - understanding of these processes is insufficent to enable accurate curation/prediction of all phenomenon - that may occur. We acheive this by making two assumptions. The first is that two features derived + **1.** unsupervised, **2.** platform agnostic, and **3.** devoid of curated rules, as the depth of + understanding of these processes is insufficient to enable accurate curation/prediction of all phenomenon + that may occur. We achieve this by making two assumptions. The first is that two features derived from the same compound with have (approximately) the same retention time. The second is that two features derived from the same compound will have (approximately) the same quantitative trend across all samples in the xcms sample set. From these assumptions, we can calculate a retention time similarity score and a correlational similarity score for each feature pair. A high similarity score for both retention time and correlation indicates a strong probability that two features derive from the same compound. Since both conditions must be met, the product of the two similarity scores provides - the best approximatio of the total similarity score - i.e. a feature pair with retention time similarity + the best approximation of the total similarity score - i.e. a feature pair with retention time similarity of 1 and correlational similarity of 0 is unlikely to derive from one compound - 1 x 0 = 0, the final similarity score is zero, indicating the two features represent two different compounds. Similarly, a feature pair with retention time similarity of 0 and correlational similarity of 1 is unlikely to derive @@ -283,11 +282,11 @@ correlational similarity of 1 is likely to derive from one compound - 1 x 1 = 1. The RAMClustR algorithm is built on creating similarity scores for all pairs of features, submitting - this score matrix for heirarchical clustering, and then cutting the resulting dendrogram into neat + this score matrix for hierarchical clustering, and then cutting the resulting dendrogram into neat chunks using the dynamicTreeCut package - where each 'chunk' of the dendrogram results in a group of - features likely to be derived from a single compound. Importantly, this is acheived without looking for + features likely to be derived from a single compound. Importantly, this is achieved without looking for specific phenomenon (i.e. sodiation), meaning that grouping can be performed on any dataset, whether it - is poisitive or negative ionization mode, EI or ESI, LC-MS GC-MS or CE-MS, in-source fragment or complex + is positive or negative ionization mode, EI or ESI, LC-MS GC-MS or CE-MS, in-source fragment or complex adduction event, and predictable or unpredictable signals. diff -r 69e0da4703b5 -r 10ded21d47c0 ramclustr.xml --- a/ramclustr.xml Fri Feb 04 08:31:26 2022 +0000 +++ b/ramclustr.xml Fri Feb 11 14:14:38 2022 +0000 @@ -20,51 +20,49 @@ store_output( #if $filetype.type_choice == "xcms": ramclustr_xcms( - input_xcms = "$filetype.input_xcms", + input_xcms = "$filetype.xcms.input_xcms", + use_pheno = $filetype.xcms.usePheno, #else: ramclustr_csv( - ms="$filetype.ms_csv.ms", - idmsms="$filetype.ms_csv.idmsms", - feature_delimiter="$filetype.ms_csv.feature_delimiter", - sample_name_column = $filetype.ms_csv.sample_name_column, - retention_time_column= $filetype.ms_csv.retention_time_column, + ms = "$filetype.ms_csv.ms", + idmsms = "$filetype.ms_csv.idmsms", #end if sr = $filetype.required.sr, - deep_split = $filetype.required.deepSplit, - block_size = $filetype.required.blocksize, - mult = $filetype.required.mult, - hmax = $filetype.required.hmax, - collapse = $filetype.required.collapse, - use_pheno = $filetype.required.usePheno, - qc_inj_range = $filetype.required.qc_inj_range, - normalize = "$filetype.required.normalize", - min_module_size = $filetype.required.minModuleSize, - linkage = "$filetype.required.linkage", - mzdec = $filetype.required.mzdec, + #if $filetype.type_choice == "xcms": + #if $filetype.required.st + st = $filetype.required.st, + #end if + #else: + st = $filetype.required.st, + #end if cor_method = "$filetype.required.cor_method", - rt_only_low_n = $filetype.required.rt_only_low_n, - replace_zeros = $filetype.required.replace_zeros, - #if $filetype.type_choice == "xcms": - #if $filetype.optional.st - st = $filetype.optional.st, + maxt = $filetype.required.maxt, + linkage = "$clustering.linkage", + min_module_size = $clustering.minModuleSize, + hmax = $clustering.hmax, + deep_split = "$clustering.deepSplit", + normalize = "$normalisation.normalisation_method.normalize", + #if "$normalisation.normalisation_method.normalize" == "batch.qc": + metadata_file = "$normalisation.normalisation_method.batch_order_qc", + qc_inj_range = $normalisation.normalisation_method.qc_inj_range, #end if - #else: - st = $filetype.ms_csv.st, - #end if - #if $filetype.optional.maxt - maxt = $filetype.optional.maxt, - #end if - #if $filetype.optional.fftempdir - fftempdir = $filetype.optional.fftempdir, - #end if - #if $filetype.metadata.batch_order_qc - metadata_file = "${filetype.metadata.batch_order_qc}", - #end if - #if $filetype.metadata.ExpDes - exp_design = "${filetype.metadata.ExpDes}" + block_size = $performance.blocksize, + mult = $performance.mult, + mzdec = $msp_output_details.mzdec, + rt_only_low_n = $extras.rt_only_low_n, + replace_zeros = $extras.replace_zeros, + #if $extras.ExpDes: + exp_design = "${$extras.ExpDes}" #end if ), - "$result", "$method_metadata", $filetype.required.merge_msp, "$spec_abundance") + $msp_output_details.merge_msp, + "$spec_abundance", + #if $msp_output_details.merge_msp: + "$mass_spectra_merged" + #else: + NULL + #end if + ) @@ -74,80 +72,102 @@ - - - - + +

+ + +

- - - +

+ + +

+ - - - - - - - - - - +

+ +

+ - - - - - - - - +

+ +

+ - - - - - - - +

+ +

+ + +

+ - - - - - - - - - - - - - +

+ +

+ `_ + .. [2] Hierarchical Clustering - `stats::hclust `_ + .. [3] Dynamic Dendrogram Pruning Based on Dendrogram Only - `dynamicTreeCut::cutreeDynamicTree `_ ]]> diff -r 69e0da4703b5 -r 10ded21d47c0 ramclustr_wrapper.R --- a/ramclustr_wrapper.R Fri Feb 04 08:31:26 2022 +0000 +++ b/ramclustr_wrapper.R Fri Feb 11 14:14:38 2022 +0000 @@ -1,13 +1,16 @@ store_output <- function( ramclustr_obj, - output_filename, - output_method_metadata, output_merge_msp, - output_spec_abundance) { - save(ramclustr_obj, file = output_filename) - RAMClustR::write.methods(ramclustr_obj, output_method_metadata) + output_spec_abundance, + msp_file) { RAMClustR::write.msp(ramclustr_obj, one.file = output_merge_msp) write.csv(ramclustr_obj$SpecAbund, file = output_spec_abundance, row.names = TRUE) + + if (!is.null(msp_file)) { + exp.name <- ramclustr_obj$ExpDes[[1]][which(row.names(ramclustr_obj$ExpDes[[1]]) == "Experiment"), 1] + filename <- paste("spectra/", exp.name, ".msp", sep = "") + file.copy(from = filename, to = msp_file, overwrite = TRUE) + } } load_experiment_definition <- function(filename) { @@ -35,25 +38,23 @@ ramclustr_xcms <- function( input_xcms, + use_pheno, sr, + st = NULL, + cor_method, + maxt, + linkage, + min_module_size, + hmax, deep_split, + normalize, + metadata_file = NULL, + qc_inj_range, block_size, mult, - hmax, - collapse, - use_pheno, - qc_inj_range, - normalize, - min_module_size, - linkage, mzdec, - cor_method, rt_only_low_n, replace_zeros, - st = NULL, - maxt = NULL, - fftempdir = NULL, - metadata_file = NULL, exp_design = NULL ) { obj <- load(input_xcms) @@ -84,7 +85,6 @@ blocksize = block_size, mult = mult, hmax = hmax, - collapse = collapse, usePheno = use_pheno, mspout = FALSE, qc.inj.range = qc_inj_range, @@ -94,7 +94,7 @@ mzdec = mzdec, cor.method = cor_method, rt.only.low.n = rt_only_low_n, - fftempdir = fftempdir, + fftempdir = NULL, replace.zeros = replace_zeros, batch = batch, order = order, @@ -107,28 +107,22 @@ ramclustr_csv <- function( ms, idmsms, - sample_name_column, - feature_delimiter, - retention_time_column, sr, + st, + cor_method, + maxt, + linkage, + min_module_size, + hmax, deep_split, + normalize, + metadata_file = NULL, + qc_inj_range, block_size, mult, - hmax, - collapse, - use_pheno, - qc_inj_range, - normalize, - min_module_size, - linkage, mzdec, - cor_method, rt_only_low_n, replace_zeros, - st = NULL, - maxt = NULL, - fftempdir = NULL, - metadata_file = NULL, exp_design = NULL ) { if (!file.exists(idmsms)) @@ -154,9 +148,6 @@ x <- RAMClustR::ramclustR( ms = ms, idmsms = idmsms, - featdelim = feature_delimiter, - timepos = retention_time_column, - sampNameCol = sample_name_column, st = st, maxt = maxt, sr = sr, @@ -164,8 +155,6 @@ blocksize = block_size, mult = mult, hmax = hmax, - collapse = collapse, - usePheno = use_pheno, mspout = FALSE, qc.inj.range = qc_inj_range, normalize = normalize, @@ -174,7 +163,7 @@ mzdec = mzdec, cor.method = cor_method, rt.only.low.n = rt_only_low_n, - fftempdir = fftempdir, + fftempdir = NULL, replace.zeros = replace_zeros, batch = batch, order = order, diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test1_metadata_xcms_1.txt --- a/test-data/test1_metadata_xcms_1.txt Fri Feb 04 08:31:26 2022 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,9 +0,0 @@ -Raw mass spectrometry data were processed using an R based workflow for feature detection, retention time alignment, feature grouping, peak filling, feature clustering. XCMS(v.3.14.0)was used for feature detection and retention time alighment. Processing was performed using R(v.R Core Team 2021). Feature data was input as an xcms object with ramclustR parameter settings of st = 12.99 sr = 0.5 and maxt = 259.8.RAMClustR (version 1.2.2) was utilized to cluster features into spectra (Broeckling 2014). The feature similarity matrix was clustered using fastcluster package heirarchical clustering method using the average method. The dendrogram was cut using the cutreeDynamicTree function from the dynamicTreeCut package. Cutting parameters were set to minModuleSize = 2, hmax = 0.3, and deepSplit = FALSE. - - 1041 features were collapsed into 174 spectra. - -(Broeckling 2014): Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal Chem. 2014. 86(14):6812-7. - -R Core Team: R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/. - -R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/. \ No newline at end of file diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test1_ramclustObj_xcms_1.rdata Binary file test-data/test1_ramclustObj_xcms_1.rdata has changed diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test2_metadata_xcms_2.txt --- a/test-data/test2_metadata_xcms_2.txt Fri Feb 04 08:31:26 2022 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,9 +0,0 @@ -Raw mass spectrometry data were processed using an R based workflow for feature detection, retention time alignment, feature grouping, peak filling, feature clustering. XCMS(v.3.14.0)was used for feature detection and retention time alighment. Processing was performed using R(v.R Core Team 2021). Feature data was input as an xcms object with ramclustR parameter settings of st = 3.92 sr = 0.5 and maxt = 78.4.RAMClustR (version 1.2.2) was utilized to cluster features into spectra (Broeckling 2014). The feature similarity matrix was clustered using fastcluster package heirarchical clustering method using the average method. The dendrogram was cut using the cutreeDynamicTree function from the dynamicTreeCut package. Cutting parameters were set to minModuleSize = 2, hmax = 0.3, and deepSplit = FALSE. - - 5881 features were collapsed into 949 spectra. - -(Broeckling 2014): Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal Chem. 2014. 86(14):6812-7. - -R Core Team: R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/. - -R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/. \ No newline at end of file diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test2_ramclustObj_xcms_2.rdata Binary file test-data/test2_ramclustObj_xcms_2.rdata has changed diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test3_metadata_csv_1.txt --- a/test-data/test3_metadata_csv_1.txt Fri Feb 04 08:31:26 2022 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,6 +0,0 @@ -Raw mass spectrometry data were processed using an R based workflow for feature detection, retention time alignment, feature grouping, peak filling, feature clustering. Feature data was input as .csv files with ramclustR parameter settings of st = 5 sr = 0.5 and maxt = 1.RAMClustR (version 1.2.2) was utilized to cluster features into spectra (Broeckling 2014). The feature similarity matrix was clustered using fastcluster package heirarchical clustering method using the average method. The dendrogram was cut using the cutreeDynamicTree function from the dynamicTreeCut package. Cutting parameters were set to minModuleSize = 2, hmax = 0.3, and deepSplit = FALSE. - - 203 features were collapsed into 22 spectra. Since there were fewer than five injections, clustering was performed only using retention time simiilarity. - -(Broeckling 2014): Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal Chem. 2014. 86(14):6812-7. - diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test3_ramclustObj_csv_1.rdata Binary file test-data/test3_ramclustObj_csv_1.rdata has changed diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test4_metadata_csv_2.txt --- a/test-data/test4_metadata_csv_2.txt Fri Feb 04 08:31:26 2022 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,6 +0,0 @@ -Raw mass spectrometry data were processed using an R based workflow for feature detection, retention time alignment, feature grouping, peak filling, feature clustering. Feature data was input as .csv files with ramclustR parameter settings of st = 1 sr = 0.5 and maxt = 60.RAMClustR (version 1.2.2) was utilized to cluster features into spectra (Broeckling 2014). The feature similarity matrix was clustered using fastcluster package heirarchical clustering method using the average method. The dendrogram was cut using the cutreeDynamicTree function from the dynamicTreeCut package. Cutting parameters were set to minModuleSize = 2, hmax = 0.3, and deepSplit = FALSE. - - 203 features were collapsed into 38 spectra. Since there were fewer than five injections, clustering was performed only using retention time simiilarity. - -(Broeckling 2014): Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal Chem. 2014. 86(14):6812-7. - diff -r 69e0da4703b5 -r 10ded21d47c0 test-data/test4_ramclustObj_csv_2.rdata Binary file test-data/test4_ramclustObj_csv_2.rdata has changed