comparison genetrack.xml @ 17:5a6ea187933b draft

Uploaded
author greg
date Wed, 16 Dec 2015 19:53:24 -0500
parents b40ad4bee6cb
children e1d437bd7d36
comparison
equal deleted inserted replaced
16:b40ad4bee6cb 17:5a6ea187933b
135 * **Sigma to use when smoothing reads** - Smooths clusters of tags via a Gaussian distribution. 135 * **Sigma to use when smoothing reads** - Smooths clusters of tags via a Gaussian distribution.
136 * **Peak exclusion zone** - Exclusion zone around each peak, eliminating all other peaks on the same strand that are within a ± bp distance of the peak. 136 * **Peak exclusion zone** - Exclusion zone around each peak, eliminating all other peaks on the same strand that are within a ± bp distance of the peak.
137 * **Exclusion zone of upstream called peaks** - Defines the exclusion zone centered over peaks upstream of a peak. 137 * **Exclusion zone of upstream called peaks** - Defines the exclusion zone centered over peaks upstream of a peak.
138 * **Exclusion zone of downstream called peaks** - Defines the exclusion zone centered over peaks downstream of a peak. 138 * **Exclusion zone of downstream called peaks** - Defines the exclusion zone centered over peaks downstream of a peak.
139 * **Filter** - Absolute read filter, restricts output to only peaks with larger peak height. 139 * **Filter** - Absolute read filter, restricts output to only peaks with larger peak height.
140
141 -----
142
143 **Output gff Columns**
144
145 1. Chromosome
146 2. Script
147 3. Placeholder (no meaning)
148 4. Start of peak exclusion zone (-e 20)
149 5. End of peak exclusion zone
150 6. Tag sum (not peak height or area under curve, which LionDB provides)
151 7. Strand
152 8. Placeholder (no meaning)
153 9. Attributes (standard deviation of reads located within exclusion zone) = fuzziness of peak
154
155 -----
156
157 **Considerations**
158
159 In principle, the width of the exclusion zone may be as large as the DNA region occupied by the native protein
160 plus a steric exclusion zone between the protein and the exonuclease. On the other hand the site might be considerably
161 smaller if the protein is in a denatured state during exonuclease digestion (since it is pre-treated with SDS).
162
163 In general, higher resolution data or smaller binding site size data should use smaller sigma values. Large binding site
164 size data such as 147 bp nucleosomal DNA use a larger sigma value like 20 (-s 20). For transcription factors mapped by
165 ChIP-exo, sigma may initially be set at 5, and the exclusion zone set at 20 (-s 5 –e 20). Sigma is typically varied
166 between ~3 and ~20. Too high of a sigma value may merge two independent nearby binding events. This may be desirable if
167 closely bound factors are not distinguishable. Too low of a sigma value will cause some tags that contribute to a binding
168 event to be excluded, because they may not be located sufficiently close to the main peak. If alternative (mutually
169 exclusive) binding is expected for two overlapping sites, and these sites are to be independently recorded, then an
170 empirically determined smaller exclusion zone width is set. Thus, the value of sigma is set empirically for each mapped
171 factor depending upon the resolution and binding site size of the binding event.
172
173 It might make sense to exclude peaks that have only a single tag, where -F 1 is used, or have their tags located on only
174 a single coordinate (called Singletons, where stddev=0 in the output file). However, low coverage datasets might be
175 improved by including them, if additional analysis (e.g., motif discovery) validates them. In addition, idealized action
176 of the exonuclease in ChIP-exo might place all tags for a peak on a single coordinate.
177
140 </help> 178 </help>
141 <expand macro="citations" /> 179 <expand macro="citations" />
142 </tool> 180 </tool>