comparison weeder2_wrapper.xml @ 0:fb7e680232af draft

Uploaded v2.0.2
author pjbriggs
date Mon, 05 Mar 2018 06:21:11 -0500
parents
children 79f075dc6cfb
comparison
equal deleted inserted replaced
-1:000000000000 0:fb7e680232af
1 <tool id="motiffinding_weeder2" name="Weeder2" version="2.0.2">
2 <description>Motif discovery in sequences from coregulated genes of a single species</description>
3 <macros>
4 <import>weeder2_macros.xml</import>
5 </macros>
6 <requirements>
7 <requirement type="package" version="2.0">weeder</requirement>
8 </requirements>
9 <command><![CDATA[
10 @CONDA_WEEDER2_FREQFILES_PATH@ &&
11 bash $__tool_directory__/weeder2_wrapper.sh
12 $sequence_file $species_code ${species_code.fields.path}
13 $output_motifs_file $output_matrix_file
14 $strands
15 #if $chipseq.use_chipseq
16 -chipseq -top $chipseq.top
17 #end if
18 #if str( $advanced_options.advanced_options_selector ) == "on"
19 -maxm $advanced_options.n_motifs_report
20 -b $advanced_options.n_motifs_build
21 -sim $advanced_options.sim_threshold
22 -em $advanced_options.em_cycles
23 #end if
24 ]]></command>
25 <inputs>
26 <param name="sequence_file" type="data" format="fasta" label="Input sequence" />
27 <param name="species_code" type="select" label="Species to use for background comparison">
28 <options from_data_table="weeder2">
29 </options>
30 </param>
31 <param name="strands" label="Use both strands of sequence" type="boolean"
32 truevalue="" falsevalue="-ss" checked="True"
33 help="If not checked then use -ss option" />
34 <conditional name="chipseq">
35 <param name="use_chipseq" type="boolean"
36 label="Use the ChIP-seq heuristic"
37 help="Speeds up the computation (-chipseq)"
38 truevalue="yes" falsevalue="no" checked="on" />
39 <when value="yes">
40 <param name="top" type="integer" value="100"
41 label="Number of top input sequences with oligos to scan for"
42 help="Increase this value to improve the chance of finding motifs enriched only in a subset of your input sequences (-top)" />
43 </when>
44 <when value="no"></when>
45 </conditional>
46 <conditional name="advanced_options">
47 <param name="advanced_options_selector" type="select"
48 label="Display advanced options">
49 <option value="off">Hide</option>
50 <option value="on">Display</option>
51 </param>
52 <when value="on">
53 <param name="n_motifs_report" type="integer" value="25"
54 label="Number of discovered motifs to report" help="(-maxm)" />
55 <param name="n_motifs_build" type="integer" value="50"
56 label="Number of top scoring motifs to build occurrences matrix profiles and outputs for"
57 help="(-b)" />
58 <param name="sim_threshold" type="float" min="0.0" max="1.0" value="0.95"
59 label="Similarity threshold for the redundancy filter"
60 help="Remove motifs that are too similar, with lower values imposing a stricter filter. Must be between 0.0 and 1.0 (-sim)" />
61 <param name="em_cycles" type="integer" min="0" max="100" value="1"
62 label="Number of expectation maximization (EM) cycles to perform"
63 help="Number of cycles must be between 0 and 100 (-em)" />
64 </when>
65 <when value="off">
66 </when>
67 </conditional>
68 </inputs>
69 <outputs>
70 <data name="output_motifs_file" format="txt" label="Weeder2 on ${on_string} (motifs)" />
71 <data name="output_matrix_file" format="txt" label="Weeder2 on ${on_string} (matrix)" />
72 </outputs>
73 <tests>
74 <test>
75 <param name="sequence_file" value="weeder_in.fa" ftype="fasta" />
76 <param name="species_code" value="MM" />
77 <output name="output_motifs_file" file="weeder2_motifs.out" lines_diff="2" />
78 <output name="output_matrix_file" file="weeder2_matrix.out" />
79 </test>
80 </tests>
81 <help>
82
83 .. class:: infomark
84
85 **What it does**
86
87 Weeder2 is a program for finding novel motifs (transcription factor binding sites)
88 conserved in a set of regulatory regions of related genes.
89
90 -------------
91
92 .. class:: infomark
93
94 **Usage advice**
95
96 Guidelines on how to use this tool can be seen in Zambelli et al. 2014 (see link
97 below), but the following is a brief guide. Please note that **motifs** are a model
98 or matrix that describes a set of sequences that may differ in the base composition.
99 **Oligos** are specific sequences found within the input sequences or genomic
100 background.
101
102 **Input sequence** (in FASTA format) should be short (100-200bp) and be reasonably
103 expected to contain an enriched motif(s). This is not generally an issue with
104 transcription factor ChIP-seq derived sequences centred on the summit of binding
105 regions that are expected to contain a dominant motif and possibly secondary motifs.
106
107 There is **no need to mask sequence for repetitive sequence** as factors may
108 legitimately bind repetitive sequence.
109
110 **Use both strands of sequence** by default, unless there is a specific reason not
111 to do so.
112
113 **Species to use for background comparison** should match the genome used to
114 generate the **input sequence**. The background genome motif frequencies are
115 generated from within the promoter regions of annotated genes and are shown to be a
116 good background for both promoter and other regulatory regions.
117
118 **Use the ChIP-seq heuristic** (-chipseq) when there are a large number of
119 input sequences (hundreds or thousands). When -chipseq is used Weeder will use
120 only oligos from the first 100 sequences to build motifs with which it scans
121 all of the input sequences. This speeds up the computational time without too much
122 risk of losing important motifs. Even if not strictly necessary it's advisable to
123 order input sequences by their significance, e.g. fold enrichment or Pvalue. For
124 large data sets (-top) should be set to a number equating at least 10 to 20% of
125 input sequences (as recommended by the authors).
126
127 **Number of discovered motifs to report** (-maxm) limits the number of reported
128 motifs even if there are more than -maxm. **Number of top scoring motifs to build
129 occurrences matrix profiles and outputs for** (-b) changes the number of top
130 scoring motifs of length 6, 8 and 10 for which the occurrence matrix is built.
131 Increasing -b may result in a larger number of reported motifs, but with potentially
132 more of low significance and increases the computational time. If increasing -b does
133 not result in more motifs in your results it means that the additional motifs are
134 filtered out by the redundancy filter or that the maximum number of reported motifs
135 set by -maxm has been reached.
136
137 **Similarity threshold for the redundancy filter** (-sim) default setting is
138 recommended.
139
140 **Number of expectation maximization (EM) cycles to perform** (-em) default is
141 recommended. The option is included to help "clean up" the resulting motif matrices.
142 In this version the number of EM steps can be increased, which can be useful for
143 motifs with highly redundant stretches of sequence.
144
145 -------------
146
147 .. class:: infomark
148
149 **A note on the results**
150
151 The resulting matrices are the result of scanning (by default both strands) for
152 oligos of length 6, 8 and 8, allowing 1, 2 and 3 substitutions respectively. The
153 matrices within the matrix.w2 file can be input into other tools. The recommended
154 next step is to use **STAMP** (http://www.benoslab.pitt.edu/stamp/), which displays
155 the motifs as logos and identifies matches with libraries of known DNA binding
156 motifs, such as TRANSFAC or JASPAR.
157
158 -------------
159
160 .. class:: infomark
161
162 **Credits**
163
164 This Galaxy tool has been developed by Peter Briggs and Ian Donaldson within the
165 Bioinformatics Core Facility at the University of Manchester, and runs the Weeder2
166 motif discovery package:
167
168 * Zambelli, F., Pesole, G. and Pavesi, G. 2014. Using Weeder, Pscan, and PscanChIP
169 for the Discovery of Enriched Transcription Factor Binding Site Motifs in
170 Nucleotide Sequences. Current Protocols in Bioinformatics. 47:2.11:2.11.1–2.11.31.
171 * http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi0211s47/full
172
173 This tool is compatible with Weeder 2.0:
174
175 * http://159.149.160.51/modtools/downloads/weeder2.html
176
177 Please kindly acknowledge both this Galaxy tool, the Weeder package and the utility
178 scripts if you use it in your work.
179 </help>
180 <citations>
181 <!--
182 See https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#A.3Ccitations.3E_tag_set
183 Can be either DOI or Bibtex
184 Use http://www.bioinformatics.org/texmed/ to convert PubMed to Bibtex
185 -->
186 <citation type="doi">10.1002/0471250953.bi0211s47</citation>
187 </citations>
188 </tool>