Mercurial > repos > devteam > kraken_filter
annotate kraken-filter.xml @ 1:f093ba52debe draft
planemo upload for repository https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/kraken/kraken_filter/ commit cb1743eafd4ca98be0148d557770ef8635cc8d4c-dirty
| author | devteam |
|---|---|
| date | Tue, 19 May 2015 16:42:21 -0400 |
| parents | 60d9479c58d6 |
| children | 317726be0703 |
| rev | line source |
|---|---|
| 0 | 1 <tool id="kraken-filter" name="Filter Kraken" version="1.0.0"> |
| 2 <description> | |
| 3 by confidence score | |
| 4 </description> | |
| 5 <macros> | |
| 6 <import>macros.xml</import> | |
| 7 </macros> | |
| 8 <command> | |
| 9 <![CDATA[ | |
|
1
f093ba52debe
planemo upload for repository https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/kraken/kraken_filter/ commit cb1743eafd4ca98be0148d557770ef8635cc8d4c-dirty
devteam
parents:
0
diff
changeset
|
10 @SET_DATABASE_PATH@ && |
| 0 | 11 kraken-filter @INPUT_DATABASE@ --threshold $threshold "${input}" > "$filtered_output" |
| 12 ]]> | |
| 13 </command> | |
| 14 <inputs> | |
| 15 <param format="tabular" label="Kraken classified output" name="input" type="data" /> | |
| 16 <param label="Confidence threshold" max="1" min="0" name="threshold" type="float" value="0" /> | |
| 17 <expand macro="input_database" /> | |
| 18 </inputs> | |
| 19 <outputs> | |
| 20 <data format="tabular" name="filtered_output" /> | |
| 21 </outputs> | |
| 22 <help> | |
| 23 <![CDATA[ | |
| 24 | |
| 25 ***Note that the database used must be the same as the one used to generate | |
| 26 the output file, or the report script may encounter problems.*** | |
| 27 | |
| 28 A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output given earlier: | |
| 29 | |
| 30 "562:13 561:4 A:31 0:1 562:3" would indicate that: | |
| 31 | |
| 32 the first 13 k-mers mapped to taxonomy ID #562 | |
| 33 the next 4 k-mers mapped to taxonomy ID #561 | |
| 34 the next 31 k-mers contained an ambiguous nucleotide | |
| 35 the next k-mer was not in the database | |
| 36 the last 3 k-mers mapped to taxonomy ID #562 | |
| 37 | |
| 38 In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified. | |
| 39 ]]> | |
| 40 </help> | |
| 41 <expand macro="version_command" /> | |
| 42 <expand macro="requirements" /> | |
| 43 <expand macro="stdio" /> | |
| 44 <expand macro="citations" /> | |
| 45 </tool> |
