annotate README.md @ 0:d883d8d86977 draft default tip

Uploaded
author jjohnson
date Mon, 13 Jan 2014 14:52:59 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
1 # sickle - A windowed adaptive trimming tool for FASTQ files using quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
2
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
3 ## About
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
4
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
5 Most modern sequencing technologies produce reads that have
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
6 deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
7 in both regions negatively impact assembles, mapping, and downstream
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
8 bioinformatics analyses.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
9
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
10 Sickle is a tool that uses sliding windows along with quality and
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
11 length thresholds to determine when quality is sufficiently low to
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
12 trim the 3'-end of reads and also determines when the quality is
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
13 sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon the
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
14 length threshold. It takes the quality values and slides a window
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
15 across them whose length is 0.1 times the length of the read. If this
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
16 length is less than 1, then the window is set to be equal to the
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
17 length of the read. Otherwise, the window slides along the quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
18 values until the average quality in the window rises above the threshold, at
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
19 which point the algorithm determines where within the window the rise occurs
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
20 and cuts the read and quality there for the 5'-end cut. Then when the average quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
21 in the window drops below the threshold, the algorithm determines where in the window
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
22 the drop occurs and cuts both the read and quality strings there for the 3'-end cut.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
23 However, if the length of the remaining sequence is less than the minimum length threshold,
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
24 then the read is discarded entirely. 5'-end trimming can be disabled.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
25
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
26 Sickle also has an option to discard reads with any Ns in them.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
27
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
28 Sickle supports three types of quality values: Illumina, Solexa,
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
29 and Sanger. Note that the Solexa quality setting is an approximation
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
30 (the actual conversion is a non-linear transformation). The end
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
31 approximation is close. Illumina quality refers to qualities encoded
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
32 with the CASAVA pipeline between versions 1.3 and 1.7. Illumina quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
33 using CASAVA >= 1.8 is Sanger encoded.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
34
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
35 Note that Sickle will remove the 2nd fastq record header (on the "+" line) and replace it
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
36 with simply a "+". This is the default format for CASAVA >= 1.8.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
37
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
38 Sickle also supports gzipped file inputs. There is also a sickle.xml file
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
39 included in the package that can be used to add sickle to your local [Galaxy](http://galaxy.psu.edu/) server.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
40
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
41 ## Requirements
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
42
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
43 Sickle requires a C compiler; GCC or clang are recommended. Sickle
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
44 relies on Heng Li's kseq.h, which is bundled with the source.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
45
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
46 Sickle also requires Zlib, which can be obtained at
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
47 <http://www.zlib.net/>.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
48
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
49 ## Building and Installing Sickle
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
50
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
51 To build Sickle, enter:
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
52
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
53 make
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
54
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
55 Then, copy or move "sickle" to a directory in your $PATH.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
56
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
57 ## Usage
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
58
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
59 Sickle has two modes to work with both paired-end and single-end
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
60 reads: `sickle se` and `sickle pe`.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
61
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
62 Running sickle by itself will print the help:
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
63
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
64 sickle
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
65
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
66 Running sickle with either the "se" or "pe" commands will give help
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
67 specific to those commands:
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
68
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
69 sickle se
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
70 sickle pe
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
71
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
72 ### Sickle Single End (`sickle se`)
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
73
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
74 `sickle se` takes an input fastq file and outputs a trimmed version of
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
75 that file. It also has options to change the length and quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
76 thresholds for trimming, as well as disabling 5'-trimming and enabling removal
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
77 of sequences with Ns.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
78
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
79 #### Examples
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
80
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
81 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
82 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
83 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
84
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
85 ### Sickle Paired End (`sickle pe`)
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
86
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
87 `sickle pe` takes two paired-end files as input and outputs two
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
88 trimmed paired-end files as well as a "singles" file. The "singles"
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
89 file contains reads that passed filter in one of the paired-end files
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
90 but not the other. You can also change the length and quality
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
91 thresholds for trimming, as well as disable 5'-trimming and enable removal
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
92 of sequences with Ns.
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
93
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
94 #### Examples
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
95
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
96 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
97 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
98 -s trimmed_singles_file.fastq
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
99
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
100 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
101 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
102 -s trimmed_singles_file.fastq -q 12 -l 15
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
103
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
104 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
105 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
106 -s trimmed_singles_file.fastq -n
d883d8d86977 Uploaded
jjohnson
parents:
diff changeset
107