Mercurial > repos > devteam > fastx_collapser
annotate fastx_collapser.xml @ 1:460c78dbadf8
Remove spurious version strings.
| author | Dave Bouvier <dave@bx.psu.edu> |
|---|---|
| date | Tue, 26 Nov 2013 12:47:40 -0500 |
| parents | 9246516d9dd5 |
| children | 17bfc147c9ea |
| rev | line source |
|---|---|
| 0 | 1 <tool id="cshl_fastx_collapser" version="1.0.0" name="Collapse"> |
| 2 <description>sequences</description> | |
| 3 <requirements> | |
| 4 <requirement type="package" version="0.0.13">fastx_toolkit</requirement> | |
| 5 </requirements> | |
| 6 <command>zcat -f '$input' | fastx_collapser -v -o '$output' | |
| 7 #if $input.ext == "fastqsanger": | |
| 8 -Q 33 | |
| 9 #end if | |
| 10 </command> | |
| 11 | |
| 12 <inputs> | |
|
1
460c78dbadf8
Remove spurious version strings.
Dave Bouvier <dave@bx.psu.edu>
parents:
0
diff
changeset
|
13 <param format="fasta,fastqsanger,fastqsolexa" name="input" type="data" label="Library to collapse" /> |
| 0 | 14 </inputs> |
| 15 | |
| 16 <!-- The order of sequences in the test output differ between 32 bit and 64 bit machines. | |
| 17 <tests> | |
| 18 <test> | |
|
1
460c78dbadf8
Remove spurious version strings.
Dave Bouvier <dave@bx.psu.edu>
parents:
0
diff
changeset
|
19 <param name="input" value="fasta_collapser1.fasta" /> |
|
460c78dbadf8
Remove spurious version strings.
Dave Bouvier <dave@bx.psu.edu>
parents:
0
diff
changeset
|
20 <output name="output" file="fasta_collapser1.out" /> |
| 0 | 21 </test> |
| 22 </tests> | |
| 23 --> | |
| 24 <outputs> | |
|
1
460c78dbadf8
Remove spurious version strings.
Dave Bouvier <dave@bx.psu.edu>
parents:
0
diff
changeset
|
25 <data format="fasta" name="output" metadata_source="input" /> |
| 0 | 26 </outputs> |
| 27 <help> | |
| 28 | |
| 29 **What it does** | |
| 30 | |
| 31 This tool collapses identical sequences in a FASTA file into a single sequence. | |
| 32 | |
| 33 -------- | |
| 34 | |
| 35 **Example** | |
| 36 | |
| 37 Example Input File (Sequence "ATAT" appears multiple times):: | |
| 38 | |
| 39 >CSHL_2_FC0042AGLLOO_1_1_605_414 | |
| 40 TGCG | |
| 41 >CSHL_2_FC0042AGLLOO_1_1_537_759 | |
| 42 ATAT | |
| 43 >CSHL_2_FC0042AGLLOO_1_1_774_520 | |
| 44 TGGC | |
| 45 >CSHL_2_FC0042AGLLOO_1_1_742_502 | |
| 46 ATAT | |
| 47 >CSHL_2_FC0042AGLLOO_1_1_781_514 | |
| 48 TGAG | |
| 49 >CSHL_2_FC0042AGLLOO_1_1_757_487 | |
| 50 TTCA | |
| 51 >CSHL_2_FC0042AGLLOO_1_1_903_769 | |
| 52 ATAT | |
| 53 >CSHL_2_FC0042AGLLOO_1_1_724_499 | |
| 54 ATAT | |
| 55 | |
| 56 Example Output file:: | |
| 57 | |
| 58 >1-1 | |
| 59 TGCG | |
| 60 >2-4 | |
| 61 ATAT | |
| 62 >3-1 | |
| 63 TGGC | |
| 64 >4-1 | |
| 65 TGAG | |
| 66 >5-1 | |
| 67 TTCA | |
| 68 | |
| 69 .. class:: infomark | |
| 70 | |
| 71 Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. | |
| 72 | |
| 73 The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. | |
| 74 | |
| 75 The following output:: | |
| 76 | |
| 77 >2-4 | |
| 78 ATAT | |
| 79 | |
| 80 means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. | |
| 81 | |
| 82 | |
| 83 ------ | |
| 84 | |
| 85 This tool is based on `FASTX-toolkit`__ by Assaf Gordon. | |
| 86 | |
| 87 .. __: http://hannonlab.cshl.edu/fastx_toolkit/ | |
| 88 | |
| 89 </help> | |
| 90 </tool> |
