comparison README.md @ 0:d883d8d86977 draft default tip

Uploaded
author jjohnson
date Mon, 13 Jan 2014 14:52:59 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:d883d8d86977
1 # sickle - A windowed adaptive trimming tool for FASTQ files using quality
2
3 ## About
4
5 Most modern sequencing technologies produce reads that have
6 deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases
7 in both regions negatively impact assembles, mapping, and downstream
8 bioinformatics analyses.
9
10 Sickle is a tool that uses sliding windows along with quality and
11 length thresholds to determine when quality is sufficiently low to
12 trim the 3'-end of reads and also determines when the quality is
13 sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon the
14 length threshold. It takes the quality values and slides a window
15 across them whose length is 0.1 times the length of the read. If this
16 length is less than 1, then the window is set to be equal to the
17 length of the read. Otherwise, the window slides along the quality
18 values until the average quality in the window rises above the threshold, at
19 which point the algorithm determines where within the window the rise occurs
20 and cuts the read and quality there for the 5'-end cut. Then when the average quality
21 in the window drops below the threshold, the algorithm determines where in the window
22 the drop occurs and cuts both the read and quality strings there for the 3'-end cut.
23 However, if the length of the remaining sequence is less than the minimum length threshold,
24 then the read is discarded entirely. 5'-end trimming can be disabled.
25
26 Sickle also has an option to discard reads with any Ns in them.
27
28 Sickle supports three types of quality values: Illumina, Solexa,
29 and Sanger. Note that the Solexa quality setting is an approximation
30 (the actual conversion is a non-linear transformation). The end
31 approximation is close. Illumina quality refers to qualities encoded
32 with the CASAVA pipeline between versions 1.3 and 1.7. Illumina quality
33 using CASAVA >= 1.8 is Sanger encoded.
34
35 Note that Sickle will remove the 2nd fastq record header (on the "+" line) and replace it
36 with simply a "+". This is the default format for CASAVA >= 1.8.
37
38 Sickle also supports gzipped file inputs. There is also a sickle.xml file
39 included in the package that can be used to add sickle to your local [Galaxy](http://galaxy.psu.edu/) server.
40
41 ## Requirements
42
43 Sickle requires a C compiler; GCC or clang are recommended. Sickle
44 relies on Heng Li's kseq.h, which is bundled with the source.
45
46 Sickle also requires Zlib, which can be obtained at
47 <http://www.zlib.net/>.
48
49 ## Building and Installing Sickle
50
51 To build Sickle, enter:
52
53 make
54
55 Then, copy or move "sickle" to a directory in your $PATH.
56
57 ## Usage
58
59 Sickle has two modes to work with both paired-end and single-end
60 reads: `sickle se` and `sickle pe`.
61
62 Running sickle by itself will print the help:
63
64 sickle
65
66 Running sickle with either the "se" or "pe" commands will give help
67 specific to those commands:
68
69 sickle se
70 sickle pe
71
72 ### Sickle Single End (`sickle se`)
73
74 `sickle se` takes an input fastq file and outputs a trimmed version of
75 that file. It also has options to change the length and quality
76 thresholds for trimming, as well as disabling 5'-trimming and enabling removal
77 of sequences with Ns.
78
79 #### Examples
80
81 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq
82 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40
83 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n
84
85 ### Sickle Paired End (`sickle pe`)
86
87 `sickle pe` takes two paired-end files as input and outputs two
88 trimmed paired-end files as well as a "singles" file. The "singles"
89 file contains reads that passed filter in one of the paired-end files
90 but not the other. You can also change the length and quality
91 thresholds for trimming, as well as disable 5'-trimming and enable removal
92 of sequences with Ns.
93
94 #### Examples
95
96 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
97 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
98 -s trimmed_singles_file.fastq
99
100 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
101 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
102 -s trimmed_singles_file.fastq -q 12 -l 15
103
104 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
105 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
106 -s trimmed_singles_file.fastq -n
107