Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
comparison Amplicon_analysis-galaxy-update-to-Amplicon_analysis_pipeline-1.3/README.rst @ 41:7b9786a43a16 draft
Uploaded test version 1.3.5.0.
author | pjbriggs |
---|---|
date | Thu, 05 Dec 2019 11:44:03 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
40:5ef333d1c303 | 41:7b9786a43a16 |
---|---|
1 Amplicon_analysis-galaxy | |
2 ======================== | |
3 | |
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline | |
5 script at https://github.com/MTutino/Amplicon_analysis | |
6 | |
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq | |
8 (Casava >= 1.8) and performs the following operations: | |
9 | |
10 * QC and clean up of input data | |
11 * Removal of singletons and chimeras and building of OTU table | |
12 and phylogenetic tree | |
13 * Beta and alpha diversity of analysis | |
14 | |
15 Usage documentation | |
16 =================== | |
17 | |
18 Usage of the tool (including required inputs) is documented within | |
19 the ``help`` section of the tool XML. | |
20 | |
21 Installing the tool in a Galaxy instance | |
22 ======================================== | |
23 | |
24 The following sections describe how to install the tool files, | |
25 dependencies and reference data, and how to configure the Galaxy | |
26 instance to detect the dependencies and reference data correctly | |
27 at run time. | |
28 | |
29 1. Install the tool from the toolshed | |
30 ------------------------------------- | |
31 | |
32 The core tool is hosted on the Galaxy toolshed, so it can be installed | |
33 directly from there (this is the recommended route): | |
34 | |
35 * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ | |
36 | |
37 Alternatively it can be installed manually; in this case there are two | |
38 files to install: | |
39 | |
40 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) | |
41 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) | |
42 | |
43 Put these in a directory that is visible to Galaxy (e.g. a | |
44 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` | |
45 file to tell Galaxy to offer the tool by adding the line e.g.:: | |
46 | |
47 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> | |
48 | |
49 2. Install the reference data | |
50 ----------------------------- | |
51 | |
52 The script ``References.sh`` from the pipeline package at | |
53 https://github.com/MTutino/Amplicon_analysis can be run to install | |
54 the reference data, for example:: | |
55 | |
56 cd /path/to/pipeline/data | |
57 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh | |
58 /bin/bash ./References.sh | |
59 | |
60 will install the data in ``/path/to/pipeline/data``. | |
61 | |
62 **NB** The final amount of data downloaded and uncompressed will be | |
63 around 9GB. | |
64 | |
65 3. Configure reference data location in Galaxy | |
66 ---------------------------------------------- | |
67 | |
68 The final step is to make your Galaxy installation aware of the | |
69 location of the reference data, so it can locate them both when the | |
70 tool is run. | |
71 | |
72 The tool locates the reference data via an environment variable called | |
73 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent | |
74 directory where the reference data has been installed. | |
75 | |
76 There are various ways to do this, depending on how your Galaxy | |
77 installation is configured: | |
78 | |
79 * **For local instances:** add a line to set it in the | |
80 ``config/local_env.sh`` file of your Galaxy installation (you | |
81 may need to create a new empty file first), e.g.:: | |
82 | |
83 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data | |
84 | |
85 * **For production instances:** set the value in the ``job_conf.xml`` | |
86 configuration file, e.g.:: | |
87 | |
88 <destination id="amplicon_analysis"> | |
89 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> | |
90 </destination> | |
91 | |
92 and then specify that the pipeline tool uses this destination:: | |
93 | |
94 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> | |
95 | |
96 (For more about job destinations see the Galaxy documentation at | |
97 https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations) | |
98 | |
99 4. Enable rendering of HTML outputs from pipeline | |
100 ------------------------------------------------- | |
101 | |
102 To ensure that HTML outputs are displayed correctly in Galaxy | |
103 (for example the Vsearch OTU table heatmaps), Galaxy needs to be | |
104 configured not to sanitize the outputs from the ``Amplicon_analysis`` | |
105 tool. | |
106 | |
107 Either: | |
108 | |
109 * **For local instances:** set ``sanitize_all_html = False`` in | |
110 ``config/galaxy.ini`` (nb don't do this on production servers or | |
111 public instances!); or | |
112 | |
113 * **For production instances:** add the ``Amplicon_analysis`` tool | |
114 to the display whitelist in the Galaxy instance: | |
115 | |
116 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in | |
117 ``config/galaxy.ini`` and restart Galaxy; | |
118 - Go to ``Admin>Manage Display Whitelist``, check the box for | |
119 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' | |
120 search function to help locate it) and click on | |
121 ``Submit new whitelist`` to update the settings. | |
122 | |
123 Additional details | |
124 ================== | |
125 | |
126 Some other things to be aware of: | |
127 | |
128 * Note that using the Silva database requires a minimum of 18Gb RAM | |
129 | |
130 Known problems | |
131 ============== | |
132 | |
133 * Only the ``VSEARCH`` pipeline in Mauro's script is currently | |
134 available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` | |
135 pipelines have yet to be implemented. | |
136 * The images in the tool help section are not visible if the | |
137 tool has been installed locally, or if it has been installed in | |
138 a Galaxy instance which is served from a subdirectory. | |
139 | |
140 These are both problems with Galaxy and not the tool, see | |
141 https://github.com/galaxyproject/galaxy/issues/4490 and | |
142 https://github.com/galaxyproject/galaxy/issues/1676 | |
143 | |
144 Appendix: installing the dependencies manually | |
145 ============================================== | |
146 | |
147 If the tool is installed from the Galaxy toolshed (recommended) then | |
148 the dependencies should be installed automatically and this step can | |
149 be skipped. | |
150 | |
151 Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used | |
152 to fetch and install the dependencies locally, for example:: | |
153 | |
154 install_amplicon_analysis.sh /path/to/local_tool_dependencies | |
155 | |
156 (This is the same script as is used to install dependencies from the | |
157 toolshed.) This can take some time to complete, and when completed will | |
158 have created a directory called ``Amplicon_analysis-1.2.3`` containing | |
159 the dependencies under the specified top level directory. | |
160 | |
161 **NB** The installed dependencies will occupy around 2.6G of disk | |
162 space. | |
163 | |
164 You will need to make sure that the ``bin`` subdirectory of this | |
165 directory is on Galaxy's ``PATH`` at runtime, for the tool to be able | |
166 to access the dependencies - for example by adding a line to the | |
167 ``local_env.sh`` file like:: | |
168 | |
169 export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH | |
170 | |
171 History | |
172 ======= | |
173 | |
174 ========== ====================================================================== | |
175 Version Changes | |
176 ---------- ---------------------------------------------------------------------- | |
177 1.3.5.0 Updated to Amplicon_Analysis_Pipeline version 1.3.5. | |
178 1.2.3.0 Updated to Amplicon_Analysis_Pipeline version 1.2.3; install | |
179 dependencies via tool_dependencies.xml. | |
180 1.2.2.0 Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes | |
181 jackknifed analysis which is not captured by Galaxy tool) | |
182 1.2.1.0 Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds | |
183 option to use the Human Oral Microbiome Database v15.1, and | |
184 updates SILVA database to v123) | |
185 1.1.0 First official version on Galaxy toolshed. | |
186 1.0.6 Expand inline documentation to provide detailed usage guidance. | |
187 1.0.5 Updates including: | |
188 | |
189 - Capture read counts from quality control as new output dataset | |
190 - Capture FastQC per-base quality boxplots for each sample as | |
191 new output dataset | |
192 - Add support for -l option (sliding window length for trimming) | |
193 - Default for -L set to "200" | |
194 1.0.4 Various updates: | |
195 | |
196 - Additional outputs are captured when a "Categories" file is | |
197 supplied (alpha diversity rarefaction curves and boxplots) | |
198 - Sample names derived from Fastqs in a collection of pairs | |
199 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) | |
200 - Input Fastqs can now be of more general ``fastq`` type | |
201 - Log file outputs are captured in new output dataset | |
202 - User can specify a "title" for the job which is copied into | |
203 the dataset names (to distinguish outputs from different runs) | |
204 - Improved detection and reporting of problems with input | |
205 Metatable | |
206 1.0.3 Take the sample names from the collection dataset names when | |
207 using collection as input (this is now the default input mode); | |
208 collect additional output dataset; disable ``usearch``-based | |
209 pipelines (i.e. ``UPARSE`` and ``QIIME``). | |
210 1.0.2 Enable support for FASTQs supplied via dataset collections and | |
211 fix some broken output datasets. | |
212 1.0.1 Initial version | |
213 ========== ====================================================================== |