Mercurial > repos > greg > genetrack
changeset 15:ebafcd6c3e0e draft
Uploaded
author | greg |
---|---|
date | Wed, 16 Dec 2015 12:43:31 -0500 |
parents | 6ad44f393892 |
children | b40ad4bee6cb |
files | genetrack.xml |
diffstat | 1 files changed, 30 insertions(+), 7 deletions(-) [+] |
line wrap: on
line diff
--- a/genetrack.xml Wed Dec 16 12:43:22 2015 -0500 +++ b/genetrack.xml Wed Dec 16 12:43:31 2015 -0500 @@ -40,7 +40,7 @@ <param name="exclusion" type="integer" value="20" min="1" label="Peak exclusion zone" help="Exclusion zone around each peak that prevents others from being called." /> <param name="up_width" type="integer" value="10" min="0" label="Exclusion zone of upstream called peaks" /> <param name="down_width" type="integer" value="10" min="0" label="Exclusion zone of downstream called peaks" /> - <param name="filter" type="integer" value="3" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." /> + <param name="filter" type="integer" value="1" min="0" label="Absolute read filter" help="Removes peaks with lower peak height." /> </inputs> <outputs> <collection name="genetrack_output" type="list" label="Genetrack results on ${on_string}"> @@ -88,10 +88,8 @@ <help> **What it does** -<![CDATA[ - -GeneTrack separately identifies peaks on the forward "+” and reverse “-” strand. The way that GeneTrack works -is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped +GeneTrack separately identifies peaks on the forward "+” (W) and reverse “-” (C) strand. The way that GeneTrack +works is to replace each tag with a probabilistic distribution of occurrences for that tag at and around its mapped genomic coordinate. The distance decay of the probabilistic distribution is set by adjusting the value of the tool's **Sigma to use when smoothing reads** parameter. GeneTrack then sums the distribution over all mapped tags. This results in a smooth continuous trace that can be globally broadened or tightened by adjusting the @@ -101,9 +99,34 @@ within that exclusion zone. In rare cases, it may be desirable to set different exclusion zones upstream (more 5’) versus downstream (more 3’) of the peak. -]]> +.. image:: $PATH_TO_IMAGES/genetrack.png + +GeneTrack continues through the data in order of peak height, until no other peaks are found, and in principle will +call a peak at a single isolated tag, if no filter is set using the tool's **Absolute read filter** parameter. A +filter value of 1 means that it will stop calling peaks when the tag count in the peak hits 1 (so single tag peaks +will be excluded in this case). GeneTrack outputs **chrom** (chromosome number), **strand** (+/W or -/C strand), +**start** (lower coordinate of exclusion zone), **end** (higher coordinate of exclusion zone), and **value** (peak +height). Genetrack's GFF output reports the start (lower coordinate) and end (higher coordinate) of the exclusion +zone. + +In principle, the width of the exclusion zone may be as large as the DNA region occupied by the native protein plus +a steric exclusion zone between the protein and the exonuclease. On the other hand the site might be considerably +smaller if the protein is in a denatured state during exonuclease digestion (since it is pre-treated with SDS). -.. image:: $PATH_TO_IMAGES/genetrack.png +In general, higher resolution data or smaller binding site size data should use smaller sigma values. Large binding +site size data such as 147 bp nucleosomal DNA use a larger sigma value like 20 (-s 20). For transcription factors +mapped by ChIP-exo, sigma may initially be set at 5, and the exclusion zone set at 20 (-s 5 –e 20). Sigma is typically +varied between ~3 and ~20. Too high of a sigma value may merge two independent nearby binding events. This may be +desirable if closely bound factors are not distinguishable. Too low of a sigma value will cause some tags that +contribute to a binding event to be excluded, because they may not be located sufficiently close to the main peak. +If alternative (mutually exclusive) binding is expected for two overlapping sites, and these sites are to be +independently recorded, then an empirically determined smaller exclusion zone width is set. Thus the value of sigma +is set empirically for each mapped factor, depending upon the resolution and binding site size of the binding event. + +It might make sense to exclude peaks that have only a single tag, where -F 1 is used, or have their tags located on +only a single coordinate (called Singletons, where stddev=0 in the output file). However, low coverage datasets might +be improved by including them, if additional analysis (e.g., motif discovery) validates them. In addition, idealized +action of the exonuclease in ChIP-exo might place all tags for a peak on a single coordinate. -----