comparison genetrack.xml @ 15:ebafcd6c3e0e draft

Uploaded
author greg
date Wed, 16 Dec 2015 12:43:31 -0500
parents cd105fdfb0da
children b40ad4bee6cb
comparison
equal deleted inserted replaced
14:6ad44f393892 15:ebafcd6c3e0e
38 </conditional> 38 </conditional>
39 <param name="sigma" type="integer" value="5" min="1" label="Sigma to use when smoothing reads" help="Higher values increase computation but produce more smoothing." /> 39 <param name="sigma" type="integer" value="5" min="1" label="Sigma to use when smoothing reads" help="Higher values increase computation but produce more smoothing." />
40 <param name="exclusion" type="integer" value="20" min="1" label="Peak exclusion zone" help="Exclusion zone around each peak that prevents others from being called." /> 40 <param name="exclusion" type="integer" value="20" min="1" label="Peak exclusion zone" help="Exclusion zone around each peak that prevents others from being called." />
41 <param name="up_width" type="integer" value="10" min="0" label="Exclusion zone of upstream called peaks" /> 41 <param name="up_width" type="integer" value="10" min="0" label="Exclusion zone of upstream called peaks" />
42 <param name="down_width" type="integer" value="10" min="0" label="Exclusion zone of downstream called peaks" /> 42 <param name="down_width" type="integer" value="10" min="0" label="Exclusion zone of downstream called peaks" />
43 <param name="filter" type="integer" value="3" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." /> 43 <param name="filter" type="integer" value="1" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." />
44 </inputs> 44 </inputs>
45 <outputs> 45 <outputs>
46 <collection name="genetrack_output" type="list" label="Genetrack results on ${on_string}"> 46 <collection name="genetrack_output" type="list" label="Genetrack results on ${on_string}">
47 <discover_datasets pattern="(?P&lt;designation&gt;.*)" directory="output" ext="gff" visible="false" /> 47 <discover_datasets pattern="(?P&lt;designation&gt;.*)" directory="output" ext="gff" visible="false" />
48 </collection> 48 </collection>
86 </test> 86 </test>
87 </tests> 87 </tests>
88 <help> 88 <help>
89 **What it does** 89 **What it does**
90 90
91 <![CDATA[ 91 GeneTrack separately identifies peaks on the forward "+” (W) and reverse “-” (C) strand. The way that GeneTrack
92 92 works is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped
93 GeneTrack separately identifies peaks on the forward "+” and reverse “-” strand. The way that GeneTrack works
94 is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped
95 genomic coordinate. The distance decay of the probabilistic distribution is set by adjusting the value of the 93 genomic coordinate. The distance decay of the probabilistic distribution is set by adjusting the value of the
96 tool's **Sigma to use when smoothing reads** parameter. GeneTrack then sums the distribution over all mapped 94 tool's **Sigma to use when smoothing reads** parameter. GeneTrack then sums the distribution over all mapped
97 tags. This results in a smooth continuous trace that can be globally broadened or tightened by adjusting the 95 tags. This results in a smooth continuous trace that can be globally broadened or tightened by adjusting the
98 sigma value. GeneTrack starts with the highest smoothed peak first, treating each strand separately if indicated 96 sigma value. GeneTrack starts with the highest smoothed peak first, treating each strand separately if indicated
99 by the data, then sets up an exclusion zone (centered over the peak) defined by the value of the **Peak exclusion 97 by the data, then sets up an exclusion zone (centered over the peak) defined by the value of the **Peak exclusion
100 zone** parameter (see figure). The exclusion zone prevents any secondary peaks from being called on the same strand 98 zone** parameter (see figure). The exclusion zone prevents any secondary peaks from being called on the same strand
101 within that exclusion zone. In rare cases, it may be desirable to set different exclusion zones upstream (more 5’) 99 within that exclusion zone. In rare cases, it may be desirable to set different exclusion zones upstream (more 5’)
102 versus downstream (more 3’) of the peak. 100 versus downstream (more 3’) of the peak.
103 101
104 ]]> 102 .. image:: $PATH_TO_IMAGES/genetrack.png
105 103
106 .. image:: $PATH_TO_IMAGES/genetrack.png 104 GeneTrack continues through the data in order of peak height, until no other peaks are found, and in principle will
105 call a peak at a single isolated tag, if no filter is set using the tool's **Absolute read filter** parameter. A
106 filter value of 1 means that it will stop calling peaks when the tag count in the peak hits 1 (so single tag peaks
107 will be excluded in this case). GeneTrack outputs **chrom** (chromosome number), **strand** (+/W or -/C strand),
108 **start** (lower coordinate of exclusion zone), **end** (higher coordinate of exclusion zone), and **value** (peak
109 height). Genetrack's GFF output reports the start (lower coordinate) and end (higher coordinate) of the exclusion
110 zone.
111
112 In principle, the width of the exclusion zone may be as large as the DNA region occupied by the native protein plus
113 a steric exclusion zone between the protein and the exonuclease. On the other hand the site might be considerably
114 smaller if the protein is in a denatured state during exonuclease digestion (since it is pre-treated with SDS).
115
116 In general, higher resolution data or smaller binding site size data should use smaller sigma values. Large binding
117 site size data such as 147 bp nucleosomal DNA use a larger sigma value like 20 (-s 20). For transcription factors
118 mapped by ChIP-exo, sigma may initially be set at 5, and the exclusion zone set at 20 (-s 5 –e 20). Sigma is typically
119 varied between ~3 and ~20. Too high of a sigma value may merge two independent nearby binding events. This may be
120 desirable if closely bound factors are not distinguishable. Too low of a sigma value will cause some tags that
121 contribute to a binding event to be excluded, because they may not be located sufficiently close to the main peak.
122 If alternative (mutually exclusive) binding is expected for two overlapping sites, and these sites are to be
123 independently recorded, then an empirically determined smaller exclusion zone width is set. Thus the value of sigma
124 is set empirically for each mapped factor, depending upon the resolution and binding site size of the binding event.
125
126 It might make sense to exclude peaks that have only a single tag, where -F 1 is used, or have their tags located on
127 only a single coordinate (called Singletons, where stddev=0 in the output file). However, low coverage datasets might
128 be improved by including them, if additional analysis (e.g., motif discovery) validates them. In addition, idealized
129 action of the exonuclease in ChIP-exo might place all tags for a peak on a single coordinate.
107 130
108 ----- 131 -----
109 132
110 **Options** 133 **Options**
111 134