comparison Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/README.rst @ 41:7b9786a43a16 draft

Uploaded test version 1.3.5.0.
author pjbriggs
date Thu, 05 Dec 2019 11:44:03 +0000
parents
children
comparison
equal deleted inserted replaced
40:5ef333d1c303 41:7b9786a43a16
1 Amplicon_analysis-galaxy
2 ========================
3
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
5 script at https://github.com/MTutino/Amplicon_analysis
6
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
8 (Casava >= 1.8) and performs the following operations:
9
10 * QC and clean up of input data
11 * Removal of singletons and chimeras and building of OTU table
12 and phylogenetic tree
13 * Beta and alpha diversity of analysis
14
15 Usage documentation
16 ===================
17
18 Usage of the tool (including required inputs) is documented within
19 the ``help`` section of the tool XML.
20
21 Installing the tool in a Galaxy instance
22 ========================================
23
24 The following sections describe how to install the tool files,
25 dependencies and reference data, and how to configure the Galaxy
26 instance to detect the dependencies and reference data correctly
27 at run time.
28
29 1. Install the tool from the toolshed
30 -------------------------------------
31
32 The core tool is hosted on the Galaxy toolshed, so it can be installed
33 directly from there (this is the recommended route):
34
35 * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
36
37 Alternatively it can be installed manually; in this case there are two
38 files to install:
39
40 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
41 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
42
43 Put these in a directory that is visible to Galaxy (e.g. a
44 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
45 file to tell Galaxy to offer the tool by adding the line e.g.::
46
47 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
48
49 2. Install the reference data
50 -----------------------------
51
52 The script ``References.sh`` from the pipeline package at
53 https://github.com/MTutino/Amplicon_analysis can be run to install
54 the reference data, for example::
55
56 cd /path/to/pipeline/data
57 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
58 /bin/bash ./References.sh
59
60 will install the data in ``/path/to/pipeline/data``.
61
62 **NB** The final amount of data downloaded and uncompressed will be
63 around 9GB.
64
65 3. Configure reference data location in Galaxy
66 ----------------------------------------------
67
68 The final step is to make your Galaxy installation aware of the
69 location of the reference data, so it can locate them both when the
70 tool is run.
71
72 The tool locates the reference data via an environment variable called
73 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
74 directory where the reference data has been installed.
75
76 There are various ways to do this, depending on how your Galaxy
77 installation is configured:
78
79 * **For local instances:** add a line to set it in the
80 ``config/local_env.sh`` file of your Galaxy installation (you
81 may need to create a new empty file first), e.g.::
82
83 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
84
85 * **For production instances:** set the value in the ``job_conf.xml``
86 configuration file, e.g.::
87
88 <destination id="amplicon_analysis">
89 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
90 </destination>
91
92 and then specify that the pipeline tool uses this destination::
93
94 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
95
96 (For more about job destinations see the Galaxy documentation at
97 https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
98
99 4. Enable rendering of HTML outputs from pipeline
100 -------------------------------------------------
101
102 To ensure that HTML outputs are displayed correctly in Galaxy
103 (for example the Vsearch OTU table heatmaps), Galaxy needs to be
104 configured not to sanitize the outputs from the ``Amplicon_analysis``
105 tool.
106
107 Either:
108
109 * **For local instances:** set ``sanitize_all_html = False`` in
110 ``config/galaxy.ini`` (nb don't do this on production servers or
111 public instances!); or
112
113 * **For production instances:** add the ``Amplicon_analysis`` tool
114 to the display whitelist in the Galaxy instance:
115
116 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
117 ``config/galaxy.ini`` and restart Galaxy;
118 - Go to ``Admin>Manage Display Whitelist``, check the box for
119 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
120 search function to help locate it) and click on
121 ``Submit new whitelist`` to update the settings.
122
123 Additional details
124 ==================
125
126 Some other things to be aware of:
127
128 * Note that using the Silva database requires a minimum of 18Gb RAM
129
130 Known problems
131 ==============
132
133 * Only the ``VSEARCH`` pipeline in Mauro's script is currently
134 available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
135 pipelines have yet to be implemented.
136 * The images in the tool help section are not visible if the
137 tool has been installed locally, or if it has been installed in
138 a Galaxy instance which is served from a subdirectory.
139
140 These are both problems with Galaxy and not the tool, see
141 https://github.com/galaxyproject/galaxy/issues/4490 and
142 https://github.com/galaxyproject/galaxy/issues/1676
143
144 Appendix: installing the dependencies manually
145 ==============================================
146
147 If the tool is installed from the Galaxy toolshed (recommended) then
148 the dependencies should be installed automatically and this step can
149 be skipped.
150
151 Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used
152 to fetch and install the dependencies locally, for example::
153
154 install_amplicon_analysis.sh /path/to/local_tool_dependencies
155
156 (This is the same script as is used to install dependencies from the
157 toolshed.) This can take some time to complete, and when completed will
158 have created a directory called ``Amplicon_analysis-1.2.3`` containing
159 the dependencies under the specified top level directory.
160
161 **NB** The installed dependencies will occupy around 2.6G of disk
162 space.
163
164 You will need to make sure that the ``bin`` subdirectory of this
165 directory is on Galaxy's ``PATH`` at runtime, for the tool to be able
166 to access the dependencies - for example by adding a line to the
167 ``local_env.sh`` file like::
168
169 export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
170
171 History
172 =======
173
174 ========== ======================================================================
175 Version Changes
176 ---------- ----------------------------------------------------------------------
177 1.3.5.0 Updated to Amplicon_Analysis_Pipeline version 1.3.5.
178 1.2.3.0 Updated to Amplicon_Analysis_Pipeline version 1.2.3; install
179 dependencies via tool_dependencies.xml.
180 1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes
181 jackknifed analysis which is not captured by Galaxy tool)
182 1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds
183 option to use the Human Oral Microbiome Database v15.1, and
184 updates SILVA database to v123)
185 1.1.0 First official version on Galaxy toolshed.
186 1.0.6 Expand inline documentation to provide detailed usage guidance.
187 1.0.5 Updates including:
188
189 - Capture read counts from quality control as new output dataset
190 - Capture FastQC per-base quality boxplots for each sample as
191 new output dataset
192 - Add support for -l option (sliding window length for trimming)
193 - Default for -L set to "200"
194 1.0.4 Various updates:
195
196 - Additional outputs are captured when a "Categories" file is
197 supplied (alpha diversity rarefaction curves and boxplots)
198 - Sample names derived from Fastqs in a collection of pairs
199 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
200 - Input Fastqs can now be of more general ``fastq`` type
201 - Log file outputs are captured in new output dataset
202 - User can specify a "title" for the job which is copied into
203 the dataset names (to distinguish outputs from different runs)
204 - Improved detection and reporting of problems with input
205 Metatable
206 1.0.3 Take the sample names from the collection dataset names when
207 using collection as input (this is now the default input mode);
208 collect additional output dataset; disable ``usearch``-based
209 pipelines (i.e. ``UPARSE`` and ``QIIME``).
210 1.0.2 Enable support for FASTQs supplied via dataset collections and
211 fix some broken output datasets.
212 1.0.1 Initial version
213 ========== ======================================================================