Mercurial > repos > greg > genetrack
comparison genetrack.xml @ 15:ebafcd6c3e0e draft
Uploaded
author | greg |
---|---|
date | Wed, 16 Dec 2015 12:43:31 -0500 |
parents | cd105fdfb0da |
children | b40ad4bee6cb |
comparison
equal
deleted
inserted
replaced
14:6ad44f393892 | 15:ebafcd6c3e0e |
---|---|
38 </conditional> | 38 </conditional> |
39 <param name="sigma" type="integer" value="5" min="1" label="Sigma to use when smoothing reads" help="Higher values increase computation but produce more smoothing." /> | 39 <param name="sigma" type="integer" value="5" min="1" label="Sigma to use when smoothing reads" help="Higher values increase computation but produce more smoothing." /> |
40 <param name="exclusion" type="integer" value="20" min="1" label="Peak exclusion zone" help="Exclusion zone around each peak that prevents others from being called." /> | 40 <param name="exclusion" type="integer" value="20" min="1" label="Peak exclusion zone" help="Exclusion zone around each peak that prevents others from being called." /> |
41 <param name="up_width" type="integer" value="10" min="0" label="Exclusion zone of upstream called peaks" /> | 41 <param name="up_width" type="integer" value="10" min="0" label="Exclusion zone of upstream called peaks" /> |
42 <param name="down_width" type="integer" value="10" min="0" label="Exclusion zone of downstream called peaks" /> | 42 <param name="down_width" type="integer" value="10" min="0" label="Exclusion zone of downstream called peaks" /> |
43 <param name="filter" type="integer" value="3" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." /> | 43 <param name="filter" type="integer" value="1" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." /> |
44 </inputs> | 44 </inputs> |
45 <outputs> | 45 <outputs> |
46 <collection name="genetrack_output" type="list" label="Genetrack results on ${on_string}"> | 46 <collection name="genetrack_output" type="list" label="Genetrack results on ${on_string}"> |
47 <discover_datasets pattern="(?P<designation>.*)" directory="output" ext="gff" visible="false" /> | 47 <discover_datasets pattern="(?P<designation>.*)" directory="output" ext="gff" visible="false" /> |
48 </collection> | 48 </collection> |
86 </test> | 86 </test> |
87 </tests> | 87 </tests> |
88 <help> | 88 <help> |
89 **What it does** | 89 **What it does** |
90 | 90 |
91 <![CDATA[ | 91 GeneTrack separately identifies peaks on the forward "+” (W) and reverse “-” (C) strand. The way that GeneTrack |
92 | 92 works is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped |
93 GeneTrack separately identifies peaks on the forward "+” and reverse “-” strand. The way that GeneTrack works | |
94 is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped | |
95 genomic coordinate. The distance decay of the probabilistic distribution is set by adjusting the value of the | 93 genomic coordinate. The distance decay of the probabilistic distribution is set by adjusting the value of the |
96 tool's **Sigma to use when smoothing reads** parameter. GeneTrack then sums the distribution over all mapped | 94 tool's **Sigma to use when smoothing reads** parameter. GeneTrack then sums the distribution over all mapped |
97 tags. This results in a smooth continuous trace that can be globally broadened or tightened by adjusting the | 95 tags. This results in a smooth continuous trace that can be globally broadened or tightened by adjusting the |
98 sigma value. GeneTrack starts with the highest smoothed peak first, treating each strand separately if indicated | 96 sigma value. GeneTrack starts with the highest smoothed peak first, treating each strand separately if indicated |
99 by the data, then sets up an exclusion zone (centered over the peak) defined by the value of the **Peak exclusion | 97 by the data, then sets up an exclusion zone (centered over the peak) defined by the value of the **Peak exclusion |
100 zone** parameter (see figure). The exclusion zone prevents any secondary peaks from being called on the same strand | 98 zone** parameter (see figure). The exclusion zone prevents any secondary peaks from being called on the same strand |
101 within that exclusion zone. In rare cases, it may be desirable to set different exclusion zones upstream (more 5’) | 99 within that exclusion zone. In rare cases, it may be desirable to set different exclusion zones upstream (more 5’) |
102 versus downstream (more 3’) of the peak. | 100 versus downstream (more 3’) of the peak. |
103 | 101 |
104 ]]> | 102 .. image:: $PATH_TO_IMAGES/genetrack.png |
105 | 103 |
106 .. image:: $PATH_TO_IMAGES/genetrack.png | 104 GeneTrack continues through the data in order of peak height, until no other peaks are found, and in principle will |
105 call a peak at a single isolated tag, if no filter is set using the tool's **Absolute read filter** parameter. A | |
106 filter value of 1 means that it will stop calling peaks when the tag count in the peak hits 1 (so single tag peaks | |
107 will be excluded in this case). GeneTrack outputs **chrom** (chromosome number), **strand** (+/W or -/C strand), | |
108 **start** (lower coordinate of exclusion zone), **end** (higher coordinate of exclusion zone), and **value** (peak | |
109 height). Genetrack's GFF output reports the start (lower coordinate) and end (higher coordinate) of the exclusion | |
110 zone. | |
111 | |
112 In principle, the width of the exclusion zone may be as large as the DNA region occupied by the native protein plus | |
113 a steric exclusion zone between the protein and the exonuclease. On the other hand the site might be considerably | |
114 smaller if the protein is in a denatured state during exonuclease digestion (since it is pre-treated with SDS). | |
115 | |
116 In general, higher resolution data or smaller binding site size data should use smaller sigma values. Large binding | |
117 site size data such as 147 bp nucleosomal DNA use a larger sigma value like 20 (-s 20). For transcription factors | |
118 mapped by ChIP-exo, sigma may initially be set at 5, and the exclusion zone set at 20 (-s 5 –e 20). Sigma is typically | |
119 varied between ~3 and ~20. Too high of a sigma value may merge two independent nearby binding events. This may be | |
120 desirable if closely bound factors are not distinguishable. Too low of a sigma value will cause some tags that | |
121 contribute to a binding event to be excluded, because they may not be located sufficiently close to the main peak. | |
122 If alternative (mutually exclusive) binding is expected for two overlapping sites, and these sites are to be | |
123 independently recorded, then an empirically determined smaller exclusion zone width is set. Thus the value of sigma | |
124 is set empirically for each mapped factor, depending upon the resolution and binding site size of the binding event. | |
125 | |
126 It might make sense to exclude peaks that have only a single tag, where -F 1 is used, or have their tags located on | |
127 only a single coordinate (called Singletons, where stddev=0 in the output file). However, low coverage datasets might | |
128 be improved by including them, if additional analysis (e.g., motif discovery) validates them. In addition, idealized | |
129 action of the exonuclease in ChIP-exo might place all tags for a peak on a single coordinate. | |
107 | 130 |
108 ----- | 131 ----- |
109 | 132 |
110 **Options** | 133 **Options** |
111 | 134 |