Mercurial > repos > jjohnson > sickle
comparison README.md @ 0:d883d8d86977 draft default tip
Uploaded
| author | jjohnson |
|---|---|
| date | Mon, 13 Jan 2014 14:52:59 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:d883d8d86977 |
|---|---|
| 1 # sickle - A windowed adaptive trimming tool for FASTQ files using quality | |
| 2 | |
| 3 ## About | |
| 4 | |
| 5 Most modern sequencing technologies produce reads that have | |
| 6 deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases | |
| 7 in both regions negatively impact assembles, mapping, and downstream | |
| 8 bioinformatics analyses. | |
| 9 | |
| 10 Sickle is a tool that uses sliding windows along with quality and | |
| 11 length thresholds to determine when quality is sufficiently low to | |
| 12 trim the 3'-end of reads and also determines when the quality is | |
| 13 sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon the | |
| 14 length threshold. It takes the quality values and slides a window | |
| 15 across them whose length is 0.1 times the length of the read. If this | |
| 16 length is less than 1, then the window is set to be equal to the | |
| 17 length of the read. Otherwise, the window slides along the quality | |
| 18 values until the average quality in the window rises above the threshold, at | |
| 19 which point the algorithm determines where within the window the rise occurs | |
| 20 and cuts the read and quality there for the 5'-end cut. Then when the average quality | |
| 21 in the window drops below the threshold, the algorithm determines where in the window | |
| 22 the drop occurs and cuts both the read and quality strings there for the 3'-end cut. | |
| 23 However, if the length of the remaining sequence is less than the minimum length threshold, | |
| 24 then the read is discarded entirely. 5'-end trimming can be disabled. | |
| 25 | |
| 26 Sickle also has an option to discard reads with any Ns in them. | |
| 27 | |
| 28 Sickle supports three types of quality values: Illumina, Solexa, | |
| 29 and Sanger. Note that the Solexa quality setting is an approximation | |
| 30 (the actual conversion is a non-linear transformation). The end | |
| 31 approximation is close. Illumina quality refers to qualities encoded | |
| 32 with the CASAVA pipeline between versions 1.3 and 1.7. Illumina quality | |
| 33 using CASAVA >= 1.8 is Sanger encoded. | |
| 34 | |
| 35 Note that Sickle will remove the 2nd fastq record header (on the "+" line) and replace it | |
| 36 with simply a "+". This is the default format for CASAVA >= 1.8. | |
| 37 | |
| 38 Sickle also supports gzipped file inputs. There is also a sickle.xml file | |
| 39 included in the package that can be used to add sickle to your local [Galaxy](http://galaxy.psu.edu/) server. | |
| 40 | |
| 41 ## Requirements | |
| 42 | |
| 43 Sickle requires a C compiler; GCC or clang are recommended. Sickle | |
| 44 relies on Heng Li's kseq.h, which is bundled with the source. | |
| 45 | |
| 46 Sickle also requires Zlib, which can be obtained at | |
| 47 <http://www.zlib.net/>. | |
| 48 | |
| 49 ## Building and Installing Sickle | |
| 50 | |
| 51 To build Sickle, enter: | |
| 52 | |
| 53 make | |
| 54 | |
| 55 Then, copy or move "sickle" to a directory in your $PATH. | |
| 56 | |
| 57 ## Usage | |
| 58 | |
| 59 Sickle has two modes to work with both paired-end and single-end | |
| 60 reads: `sickle se` and `sickle pe`. | |
| 61 | |
| 62 Running sickle by itself will print the help: | |
| 63 | |
| 64 sickle | |
| 65 | |
| 66 Running sickle with either the "se" or "pe" commands will give help | |
| 67 specific to those commands: | |
| 68 | |
| 69 sickle se | |
| 70 sickle pe | |
| 71 | |
| 72 ### Sickle Single End (`sickle se`) | |
| 73 | |
| 74 `sickle se` takes an input fastq file and outputs a trimmed version of | |
| 75 that file. It also has options to change the length and quality | |
| 76 thresholds for trimming, as well as disabling 5'-trimming and enabling removal | |
| 77 of sequences with Ns. | |
| 78 | |
| 79 #### Examples | |
| 80 | |
| 81 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq | |
| 82 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40 | |
| 83 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n | |
| 84 | |
| 85 ### Sickle Paired End (`sickle pe`) | |
| 86 | |
| 87 `sickle pe` takes two paired-end files as input and outputs two | |
| 88 trimmed paired-end files as well as a "singles" file. The "singles" | |
| 89 file contains reads that passed filter in one of the paired-end files | |
| 90 but not the other. You can also change the length and quality | |
| 91 thresholds for trimming, as well as disable 5'-trimming and enable removal | |
| 92 of sequences with Ns. | |
| 93 | |
| 94 #### Examples | |
| 95 | |
| 96 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
| 97 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
| 98 -s trimmed_singles_file.fastq | |
| 99 | |
| 100 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
| 101 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
| 102 -s trimmed_singles_file.fastq -q 12 -l 15 | |
| 103 | |
| 104 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
| 105 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
| 106 -s trimmed_singles_file.fastq -n | |
| 107 |
