Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
comparison README.rst @ 0:b433086738d6 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit ba3e5b591407db52a586361efb21927c8171ec0e
| author | pjbriggs | 
|---|---|
| date | Wed, 08 Nov 2017 08:43:02 -0500 | 
| parents | |
| children | a00f366adc45 | 
   comparison
  equal
  deleted
  inserted
  replaced
| -1:000000000000 | 0:b433086738d6 | 
|---|---|
| 1 Amplicon_analysis-galaxy | |
| 2 ======================== | |
| 3 | |
| 4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline | |
| 5 script at https://github.com/MTutino/Amplicon_analysis | |
| 6 | |
| 7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq | |
| 8 (Casava >= 1.8) and performs the following operations: | |
| 9 | |
| 10 * QC and clean up of input data | |
| 11 * Removal of singletons and chimeras and building of OTU table | |
| 12 and phylogenetic tree | |
| 13 * Beta and alpha diversity of analysis | |
| 14 | |
| 15 Usage documentation | |
| 16 =================== | |
| 17 | |
| 18 Usage of the tool (including required inputs) is documented within | |
| 19 the ``help`` section of the tool XML. | |
| 20 | |
| 21 Installing the tool in a Galaxy instance | |
| 22 ======================================== | |
| 23 | |
| 24 The tool is not currently hosted on a Galaxy toolshed both the tool | |
| 25 files and the dependencies must be installed manually. In addition | |
| 26 it is necessary to fetch and install the reference data. | |
| 27 | |
| 28 1. Install the dependencies | |
| 29 --------------------------- | |
| 30 | |
| 31 The ``install_tool_deps.sh`` script can be used to fetch and install the | |
| 32 dependencies locally, for example:: | |
| 33 | |
| 34 install_tool_deps.sh /path/to/local_tool_dependencies | |
| 35 | |
| 36 This can take some time to complete. When finished it should have | |
| 37 created a set of directories containing the dependencies under the | |
| 38 specified top level directory. | |
| 39 | |
| 40 2. Install the tool files | |
| 41 ------------------------- | |
| 42 | |
| 43 There are two files to install: | |
| 44 | |
| 45 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) | |
| 46 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) | |
| 47 | |
| 48 Put these in a directory that is visible to Galaxy (e.g. a | |
| 49 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` | |
| 50 file to tell Galaxy to offer the tool by adding the line e.g.:: | |
| 51 | |
| 52 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> | |
| 53 | |
| 54 3. Install the reference data | |
| 55 ----------------------------- | |
| 56 | |
| 57 The script ``References.sh`` from the pipeline package at | |
| 58 https://github.com/MTutino/Amplicon_analysis can be run to install | |
| 59 the reference data, for example:: | |
| 60 | |
| 61 cd /path/to/pipeline/data | |
| 62 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh | |
| 63 /bin/bash ./References.sh | |
| 64 | |
| 65 will install the data in ``/path/to/pipeline/data``. | |
| 66 | |
| 67 **NB** The final amount of data downloaded and uncompressed will be | |
| 68 around 6GB. | |
| 69 | |
| 70 4. Configure dependencies and reference data in Galaxy | |
| 71 ------------------------------------------------------ | |
| 72 | |
| 73 The final steps are to make your Galaxy installation aware of the | |
| 74 tool dependencies and reference data, so it can locate them both when | |
| 75 the tool is run. | |
| 76 | |
| 77 To target the tool dependencies installed previously, add the | |
| 78 following lines to the ``dependency_resolvers_conf.xml`` file in the | |
| 79 Galaxy ``config`` directory:: | |
| 80 | |
| 81 <dependency_resolvers> | |
| 82 ... | |
| 83 <galaxy_packages base_path="/path/to/local_tool_dependencies" /> | |
| 84 <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" /> | |
| 85 ... | |
| 86 </dependency_resolvers> | |
| 87 | |
| 88 (NB it is recommended to place these *before* the ``<conda ... />`` | |
| 89 resolvers) | |
| 90 | |
| 91 (If you're not familiar with dependency resolvers in Galaxy then | |
| 92 see the documentation at | |
| 93 https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html | |
| 94 for more details.) | |
| 95 | |
| 96 The tool locates the reference data via an environment variable called | |
| 97 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent | |
| 98 directory where the reference data has been installed. | |
| 99 | |
| 100 There are various ways to do this, depending on how your Galaxy | |
| 101 installation is configured: | |
| 102 | |
| 103 * **For local instances:** add a line to set it in the | |
| 104 ``config/local_env.sh`` file of your Galaxy installation, e.g.:: | |
| 105 | |
| 106 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data | |
| 107 | |
| 108 * **For production instances:** set the value in the ``job_conf.xml`` | |
| 109 configuration file, e.g.:: | |
| 110 | |
| 111 <destination id="amplicon_analysis"> | |
| 112 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> | |
| 113 </destination> | |
| 114 | |
| 115 and then specify that the pipeline tool uses this destination:: | |
| 116 | |
| 117 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> | |
| 118 | |
| 119 (For more about job destinations see the Galaxy documentation at | |
| 120 https://galaxyproject.org/admin/config/jobs/#job-destinations) | |
| 121 | |
| 122 5. Enable rendering of HTML outputs from pipeline | |
| 123 ------------------------------------------------- | |
| 124 | |
| 125 To ensure that HTML outputs are displayed correctly in Galaxy | |
| 126 (for example the Vsearch OTU table heatmaps), Galaxy needs to be | |
| 127 configured not to sanitize the outputs from the ``Amplicon_analysis`` | |
| 128 tool. | |
| 129 | |
| 130 Either: | |
| 131 | |
| 132 * **For local instances:** set ``sanitize_all_html = False`` in | |
| 133 ``config/galaxy.ini`` (nb don't do this on production servers or | |
| 134 public instances!); or | |
| 135 | |
| 136 * **For production instances:** add the ``Amplicon_analysis`` tool | |
| 137 to the display whitelist in the Galaxy instance: | |
| 138 | |
| 139 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in | |
| 140 ``config/galaxy.ini`` and restart Galaxy; | |
| 141 - Go to ``Admin>Manage Display Whitelist``, check the box for | |
| 142 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' | |
| 143 search function to help locate it) and click on | |
| 144 ``Submit new whitelist`` to update the settings. | |
| 145 | |
| 146 Additional details | |
| 147 ================== | |
| 148 | |
| 149 Some other things to be aware of: | |
| 150 | |
| 151 * Note that using the Silva database requires a minimum of 18Gb RAM | |
| 152 | |
| 153 Known problems | |
| 154 ============== | |
| 155 | |
| 156 * Only the ``VSEARCH`` pipeline in Mauro's script is currently | |
| 157 available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` | |
| 158 pipelines have yet to be implemented. | |
| 159 * The images in the tool help section are not visible if the | |
| 160 tool has been installed locally, or if it has been installed in | |
| 161 a Galaxy instance which is served from a subdirectory. | |
| 162 | |
| 163 These are both problems with Galaxy and not the tool, see | |
| 164 https://github.com/galaxyproject/galaxy/issues/4490 and | |
| 165 https://github.com/galaxyproject/galaxy/issues/1676 | |
| 166 | |
| 167 Appendix: availability of tool dependencies | |
| 168 =========================================== | |
| 169 | |
| 170 The tool takes its dependencies from the underlying pipeline script (see | |
| 171 https://github.com/MTutino/Amplicon_analysis/blob/master/README.md | |
| 172 for details). | |
| 173 | |
| 174 As noted above, currently the ``install_tool_deps.sh`` script can be | |
| 175 used to manually install the dependencies for a local tool install. | |
| 176 | |
| 177 In principle these should also be available if the tool were installed | |
| 178 from a toolshed. However it would be preferrable in this case to get as | |
| 179 many of the dependencies as possible via the ``conda`` dependency | |
| 180 resolver. | |
| 181 | |
| 182 The following are known to be available via conda, with the required | |
| 183 version: | |
| 184 | |
| 185 - cutadapt 1.8.1 | |
| 186 - sickle-trim 1.33 | |
| 187 - bioawk 1.0 | |
| 188 - fastqc 0.11.3 | |
| 189 - R 3.2.0 | |
| 190 | |
| 191 Some dependencies are available but with the "wrong" versions: | |
| 192 | |
| 193 - spades (need 3.5.0) | |
| 194 - qiime (need 1.8.0) | |
| 195 - blast (need 2.2.26) | |
| 196 - vsearch (need 1.1.3) | |
| 197 | |
| 198 The following dependencies are currently unavailable: | |
| 199 | |
| 200 - fasta_number (need 02jun2015) | |
| 201 - fasta-splitter (need 0.2.4) | |
| 202 - rdp_classifier (need 2.2) | |
| 203 - microbiomeutil (need r20110519) | |
| 204 | |
| 205 (NB usearch 6.1.544 and 8.0.1623 are special cases which must be | |
| 206 handled outside of Galaxy's dependency management systems.) | |
| 207 | |
| 208 History | |
| 209 ======= | |
| 210 | |
| 211 ========== ====================================================================== | |
| 212 Version Changes | |
| 213 ---------- ---------------------------------------------------------------------- | |
| 214 1.1.0 First official version on Galaxy toolshed. | |
| 215 1.0.6 Expand inline documentation to provide detailed usage guidance. | |
| 216 1.0.5 Updates including: | |
| 217 | |
| 218 - Capture read counts from quality control as new output dataset | |
| 219 - Capture FastQC per-base quality boxplots for each sample as | |
| 220 new output dataset | |
| 221 - Add support for -l option (sliding window length for trimming) | |
| 222 - Default for -L set to "200" | |
| 223 1.0.4 Various updates: | |
| 224 | |
| 225 - Additional outputs are captured when a "Categories" file is | |
| 226 supplied (alpha diversity rarefaction curves and boxplots) | |
| 227 - Sample names derived from Fastqs in a collection of pairs | |
| 228 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) | |
| 229 - Input Fastqs can now be of more general ``fastq`` type | |
| 230 - Log file outputs are captured in new output dataset | |
| 231 - User can specify a "title" for the job which is copied into | |
| 232 the dataset names (to distinguish outputs from different runs) | |
| 233 - Improved detection and reporting of problems with input | |
| 234 Metatable | |
| 235 1.0.3 Take the sample names from the collection dataset names when | |
| 236 using collection as input (this is now the default input mode); | |
| 237 collect additional output dataset; disable ``usearch``-based | |
| 238 pipelines (i.e. ``UPARSE`` and ``QIIME``). | |
| 239 1.0.2 Enable support for FASTQs supplied via dataset collections and | |
| 240 fix some broken output datasets. | |
| 241 1.0.1 Initial version | |
| 242 ========== ====================================================================== | 
