41
|
1 Amplicon_analysis-galaxy
|
|
2 ========================
|
|
3
|
|
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
|
|
5 script at https://github.com/MTutino/Amplicon_analysis
|
|
6
|
|
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
|
|
8 (Casava >= 1.8) and performs the following operations:
|
|
9
|
|
10 * QC and clean up of input data
|
|
11 * Removal of singletons and chimeras and building of OTU table
|
|
12 and phylogenetic tree
|
|
13 * Beta and alpha diversity of analysis
|
|
14
|
|
15 Usage documentation
|
|
16 ===================
|
|
17
|
|
18 Usage of the tool (including required inputs) is documented within
|
|
19 the ``help`` section of the tool XML.
|
|
20
|
|
21 Installing the tool in a Galaxy instance
|
|
22 ========================================
|
|
23
|
|
24 The following sections describe how to install the tool files,
|
|
25 dependencies and reference data, and how to configure the Galaxy
|
|
26 instance to detect the dependencies and reference data correctly
|
|
27 at run time.
|
|
28
|
|
29 1. Install the tool from the toolshed
|
|
30 -------------------------------------
|
|
31
|
|
32 The core tool is hosted on the Galaxy toolshed, so it can be installed
|
|
33 directly from there (this is the recommended route):
|
|
34
|
|
35 * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
|
|
36
|
|
37 Alternatively it can be installed manually; in this case there are two
|
|
38 files to install:
|
|
39
|
|
40 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
|
|
41 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
|
|
42
|
|
43 Put these in a directory that is visible to Galaxy (e.g. a
|
|
44 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
|
|
45 file to tell Galaxy to offer the tool by adding the line e.g.::
|
|
46
|
|
47 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
|
|
48
|
|
49 2. Install the reference data
|
|
50 -----------------------------
|
|
51
|
|
52 The script ``References.sh`` from the pipeline package at
|
|
53 https://github.com/MTutino/Amplicon_analysis can be run to install
|
|
54 the reference data, for example::
|
|
55
|
|
56 cd /path/to/pipeline/data
|
|
57 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
|
|
58 /bin/bash ./References.sh
|
|
59
|
|
60 will install the data in ``/path/to/pipeline/data``.
|
|
61
|
|
62 **NB** The final amount of data downloaded and uncompressed will be
|
|
63 around 9GB.
|
|
64
|
|
65 3. Configure reference data location in Galaxy
|
|
66 ----------------------------------------------
|
|
67
|
|
68 The final step is to make your Galaxy installation aware of the
|
|
69 location of the reference data, so it can locate them both when the
|
|
70 tool is run.
|
|
71
|
|
72 The tool locates the reference data via an environment variable called
|
|
73 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
|
|
74 directory where the reference data has been installed.
|
|
75
|
|
76 There are various ways to do this, depending on how your Galaxy
|
|
77 installation is configured:
|
|
78
|
|
79 * **For local instances:** add a line to set it in the
|
|
80 ``config/local_env.sh`` file of your Galaxy installation (you
|
|
81 may need to create a new empty file first), e.g.::
|
|
82
|
|
83 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
|
|
84
|
|
85 * **For production instances:** set the value in the ``job_conf.xml``
|
|
86 configuration file, e.g.::
|
|
87
|
|
88 <destination id="amplicon_analysis">
|
|
89 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
|
|
90 </destination>
|
|
91
|
|
92 and then specify that the pipeline tool uses this destination::
|
|
93
|
|
94 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
|
|
95
|
|
96 (For more about job destinations see the Galaxy documentation at
|
|
97 https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
|
|
98
|
|
99 4. Enable rendering of HTML outputs from pipeline
|
|
100 -------------------------------------------------
|
|
101
|
|
102 To ensure that HTML outputs are displayed correctly in Galaxy
|
|
103 (for example the Vsearch OTU table heatmaps), Galaxy needs to be
|
|
104 configured not to sanitize the outputs from the ``Amplicon_analysis``
|
|
105 tool.
|
|
106
|
|
107 Either:
|
|
108
|
|
109 * **For local instances:** set ``sanitize_all_html = False`` in
|
|
110 ``config/galaxy.ini`` (nb don't do this on production servers or
|
|
111 public instances!); or
|
|
112
|
|
113 * **For production instances:** add the ``Amplicon_analysis`` tool
|
|
114 to the display whitelist in the Galaxy instance:
|
|
115
|
|
116 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
|
|
117 ``config/galaxy.ini`` and restart Galaxy;
|
|
118 - Go to ``Admin>Manage Display Whitelist``, check the box for
|
|
119 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
|
|
120 search function to help locate it) and click on
|
|
121 ``Submit new whitelist`` to update the settings.
|
|
122
|
|
123 Additional details
|
|
124 ==================
|
|
125
|
|
126 Some other things to be aware of:
|
|
127
|
|
128 * Note that using the Silva database requires a minimum of 18Gb RAM
|
|
129
|
|
130 Known problems
|
|
131 ==============
|
|
132
|
|
133 * Only the ``VSEARCH`` pipeline in Mauro's script is currently
|
|
134 available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
|
|
135 pipelines have yet to be implemented.
|
|
136 * The images in the tool help section are not visible if the
|
|
137 tool has been installed locally, or if it has been installed in
|
|
138 a Galaxy instance which is served from a subdirectory.
|
|
139
|
|
140 These are both problems with Galaxy and not the tool, see
|
|
141 https://github.com/galaxyproject/galaxy/issues/4490 and
|
|
142 https://github.com/galaxyproject/galaxy/issues/1676
|
|
143
|
|
144 Appendix: installing the dependencies manually
|
|
145 ==============================================
|
|
146
|
|
147 If the tool is installed from the Galaxy toolshed (recommended) then
|
|
148 the dependencies should be installed automatically and this step can
|
|
149 be skipped.
|
|
150
|
|
151 Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used
|
|
152 to fetch and install the dependencies locally, for example::
|
|
153
|
|
154 install_amplicon_analysis.sh /path/to/local_tool_dependencies
|
|
155
|
|
156 (This is the same script as is used to install dependencies from the
|
|
157 toolshed.) This can take some time to complete, and when completed will
|
|
158 have created a directory called ``Amplicon_analysis-1.2.3`` containing
|
|
159 the dependencies under the specified top level directory.
|
|
160
|
|
161 **NB** The installed dependencies will occupy around 2.6G of disk
|
|
162 space.
|
|
163
|
|
164 You will need to make sure that the ``bin`` subdirectory of this
|
|
165 directory is on Galaxy's ``PATH`` at runtime, for the tool to be able
|
|
166 to access the dependencies - for example by adding a line to the
|
|
167 ``local_env.sh`` file like::
|
|
168
|
|
169 export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
|
|
170
|
|
171 History
|
|
172 =======
|
|
173
|
|
174 ========== ======================================================================
|
|
175 Version Changes
|
|
176 ---------- ----------------------------------------------------------------------
|
|
177 1.3.5.0 Updated to Amplicon_Analysis_Pipeline version 1.3.5.
|
|
178 1.2.3.0 Updated to Amplicon_Analysis_Pipeline version 1.2.3; install
|
|
179 dependencies via tool_dependencies.xml.
|
|
180 1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes
|
|
181 jackknifed analysis which is not captured by Galaxy tool)
|
|
182 1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds
|
|
183 option to use the Human Oral Microbiome Database v15.1, and
|
|
184 updates SILVA database to v123)
|
|
185 1.1.0 First official version on Galaxy toolshed.
|
|
186 1.0.6 Expand inline documentation to provide detailed usage guidance.
|
|
187 1.0.5 Updates including:
|
|
188
|
|
189 - Capture read counts from quality control as new output dataset
|
|
190 - Capture FastQC per-base quality boxplots for each sample as
|
|
191 new output dataset
|
|
192 - Add support for -l option (sliding window length for trimming)
|
|
193 - Default for -L set to "200"
|
|
194 1.0.4 Various updates:
|
|
195
|
|
196 - Additional outputs are captured when a "Categories" file is
|
|
197 supplied (alpha diversity rarefaction curves and boxplots)
|
|
198 - Sample names derived from Fastqs in a collection of pairs
|
|
199 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
|
|
200 - Input Fastqs can now be of more general ``fastq`` type
|
|
201 - Log file outputs are captured in new output dataset
|
|
202 - User can specify a "title" for the job which is copied into
|
|
203 the dataset names (to distinguish outputs from different runs)
|
|
204 - Improved detection and reporting of problems with input
|
|
205 Metatable
|
|
206 1.0.3 Take the sample names from the collection dataset names when
|
|
207 using collection as input (this is now the default input mode);
|
|
208 collect additional output dataset; disable ``usearch``-based
|
|
209 pipelines (i.e. ``UPARSE`` and ``QIIME``).
|
|
210 1.0.2 Enable support for FASTQs supplied via dataset collections and
|
|
211 fix some broken output datasets.
|
|
212 1.0.1 Initial version
|
|
213 ========== ======================================================================
|