Mercurial > repos > matthias > dada2_dada
diff dada2_dada.xml @ 3:0b3194ac6a95 draft
planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools/tree/topic/dada2/tools/dada2 commit 5b1603bbcd3f139cad5c876be83fcb39697b5613-dirty
| author | matthias |
|---|---|
| date | Mon, 29 Apr 2019 09:02:17 -0400 |
| parents | 249ba5cbeb6e |
| children | 4a770a261b16 |
line wrap: on
line diff
--- a/dada2_dada.xml Tue Apr 09 07:13:24 2019 -0400 +++ b/dada2_dada.xml Mon Apr 29 09:02:17 2019 -0400 @@ -22,13 +22,13 @@ #if $batch_cond.batch_select == "no" derep <- list() #for $d in $batch_cond.derep: -derep[["$d.element_identifier"]] <- readRDS(file.path('$d.extra_files_path', 'Rdata')) +derep[["$d.element_identifier"]] <- readRDS('$d') #end for #else -derep <- readRDS(file.path('$batch_cond.derep.extra_files_path', 'Rdata')) +derep <- readRDS('$batch_cond.derep') #end if -err <- readRDS(file.path('$errorrates.extra_files_path',"Rdata")) +err <- readRDS('$errorrates') #if $batch_cond.batch_select == "yes": pool <- F @@ -37,18 +37,24 @@ pool <- T #else if $batch_cond.pool == "FALSE" pool <- F - #else + #else pool <- 'pseudo' #end if #end if -dada_result <- dada(derep, err, -## not needed for end user: errorEstimationFunction = $errfoo, selfConsist = $selfconsist, +dada_result <- dada(derep, err, +## not needed for end user: errorEstimationFunction = $errfoo, selfConsist = $selfconsist, pool = pool, multithread = nthreads) -#if $batch_cond.batch_select == "no": + #if $batch_cond.batch_select == "no": + #if len($batch_cond.derep) > 1: for( id in names(dada_result) ){ saveRDS(dada_result[[id]], file=file.path("output" ,paste(id, "dada2_dada", sep="."))) } + #else + #for $d in $batch_cond.derep: + saveRDS(dada_result, file=file.path("output" ,paste('$d.element_identifier', "dada2_dada", sep="."))) + #end for + #end if #else saveRDS(dada_result, file='$dada') #end if @@ -74,7 +80,7 @@ </when> <when value="no"> <param name="derep" type="data" multiple="true" format="dada2_derep" label="Dereplicated reads"/> - <param name="pool" type="select" label="Pool samples"> + <param argument="pool" type="select" label="Pool samples"> <option value="FALSE">process samples individually</option> <option value="TRUE">pool samples</option> <option value="pseudo">pseudo pooling between individually processed samples</option> @@ -82,12 +88,12 @@ </when> </conditional> <param name="errorrates" type="data" format="dada2_errorrates" label="Error rates"/> - <!-- not needed for end user I guess - <expand macro="errorEstimationFunction"/> - <param name="selfconsist" type="boolean" checked="false" truevalue="TRUE" falsevalue="FALSE" label="Alternate between sample inference and error rate estimation until convergence"/>--> + <!-- not needed for end user I guess + <expand macro="errorEstimationFunction"/> + <param name="selfconsist" type="boolean" checked="false" truevalue="TRUE" falsevalue="FALSE" label="Alternate between sample inference and error rate estimation until convergence"/>--> </inputs> <outputs> - <data name="dada" format="dada2_dada"> + <data name="dada" format="dada2_dada"> <filter>batch_cond['batch_select']=="yes"</filter> </data> <collection name="data_collection" type="list"> @@ -96,20 +102,24 @@ </collection> </outputs> <tests> - <test> - <param name="batch_cond|batch_select" value="no"/> - <param name="batch_cond|derep" value="derepFastq_single_F3D0_R1.table" ftype="dada2_derep" > - <extra_files type="Rdata" name="Rdata" value="derepFastq_paired_F3D0_R1.Rdata" /> - </param> - <param name="errorrates" value="learnErrors_forward.tab" ftype="dada2_errorrates" > - <extra_files type="Rdata" name="Rdata" value="learnErrors_forward.Rdata" /> - </param> + <test> + <param name="batch_cond|batch_select" value="no"/> + <param name="batch_cond|derep" value="derepFastq_F3D0_R1.Rdata" ftype="dada2_derep" /> + <param name="errorrates" value="learnErrors_F3D0_R1.Rdata" ftype="dada2_errorrates" /> <output_collection name="data_collection" type="list"> - <element name="single_F3D0_R1" file="single_F3D0_R1.dada" ftype="dada2_dada"/> + <element name="derepFastq_F3D0_R1.Rdata" file="dada_F3D0_R1.Rdata" ftype="dada2_dada"/> </output_collection> - </test> + </test> + <test> + <param name="batch_cond|batch_select" value="no"/> + <param name="batch_cond|derep" value="derepFastq_F3D0_R2.Rdata" ftype="dada2_derep" /> + <param name="errorrates" value="learnErrors_F3D0_R2.Rdata" ftype="dada2_errorrates" /> + <output_collection name="data_collection" type="list"> + <element name="derepFastq_F3D0_R2.Rdata" file="dada_F3D0_R2.Rdata" ftype="dada2_dada"/> + </output_collection> + </test> </tests> - <help><![CDATA[ + <help><![CDATA[ Description ........... @@ -127,14 +137,14 @@ You can decide to compute the data jointly or in batches. -- Jointly (Process "samples in batches"=no): A single Galaxy job is started that processes all derep data sets jointly. You may chose different pooling strategies: if the started dada job processes the samples individually, pooled, or pseudo pooled. -- In batches (Process "samples in batches"=yes): A separate Galaxy job is started for earch derep data set. This is equivalent to joint processing and choosing to process samples individually. +- Jointly (Process "samples in batches"=no): A single Galaxy job is started that processes all derep data sets jointly. You may chose different pooling strategies: if the started dada job processes the samples individually, pooled, or pseudo pooled. +- In batches (Process "samples in batches"=yes): A separate Galaxy job is started for earch derep data set. This is equivalent to joint processing and choosing to process samples individually. -While the single dada job (in case of joint processing) can use multiple cores on one compute node, batched processing distributes the work on a number of jobs (equal to the number of input derep data sets) where each can use multiple cores. Hence, if you intend to or need to process the data sets individually, batched processing is more efficient -- in particular if Galaxy has access to a larger number of compute ressources. +While the single dada job (in case of joint processing) can use multiple cores on one compute node, batched processing distributes the work on a number of jobs (equal to the number of input derep data sets) where each can use multiple cores. Hence, if you intend to or need to process the data sets individually, batched processing is more efficient -- in particular if Galaxy has access to a larger number of compute ressources. A typical use case of individual processing of the samples are large data sets for which the pooled strategy needs to much time or memory. -**Output**: a data set of type dada2_dada. +**Output**: a data set of type dada2_dada (which is a RData file containing the output of dada2's dada function). The output of this tool can serve as input for *dada2: mergePairs*, *dada2: removeBimeraDinovo*, and "dada2: makeSequenceTable" @@ -142,6 +152,8 @@ ....... Briefly, dada implements a statistical test for the notion that a specific sequence was seen too many times to have been caused by amplicon errors from currently inferred sample sequences. Overly abundant sequences are used as the seeds of new partitions of sequencing reads, and the final set of partitions is taken to represent the denoised composition of the sample. A more detailed explanation of the algorithm is found in the dada2 puplication (see below) and https://doi.org/10.1186/1471-2105-13-283. dada depends on a parametric error model of substitutions. Thus the quality of its sample inference is affected by the accuracy of the estimated error rates. All comparisons between sequences performed by dada depend on pairwise alignments. This step is the most computationally intensive part of the algorithm, and two alignment heuristics have been implemented in dada for speed: A kmer-distance screen and banded Needleman-Wunsch alignmemt. + +@HELP_OVERVIEW@ ]]></help> <expand macro="citations"/> </tool>
