| 
41
 | 
     1 Amplicon_analysis-galaxy
 | 
| 
 | 
     2 ========================
 | 
| 
 | 
     3 
 | 
| 
 | 
     4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
 | 
| 
 | 
     5 script at https://github.com/MTutino/Amplicon_analysis
 | 
| 
 | 
     6 
 | 
| 
 | 
     7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
 | 
| 
 | 
     8 (Casava >= 1.8) and performs the following operations:
 | 
| 
 | 
     9 
 | 
| 
 | 
    10  * QC and clean up of input data
 | 
| 
 | 
    11  * Removal of singletons and chimeras and building of OTU table
 | 
| 
 | 
    12    and phylogenetic tree
 | 
| 
 | 
    13  * Beta and alpha diversity of analysis
 | 
| 
 | 
    14 
 | 
| 
 | 
    15 Usage documentation
 | 
| 
 | 
    16 ===================
 | 
| 
 | 
    17 
 | 
| 
 | 
    18 Usage of the tool (including required inputs) is documented within
 | 
| 
 | 
    19 the ``help`` section of the tool XML.
 | 
| 
 | 
    20 
 | 
| 
 | 
    21 Installing the tool in a Galaxy instance
 | 
| 
 | 
    22 ========================================
 | 
| 
 | 
    23 
 | 
| 
 | 
    24 The following sections describe how to install the tool files,
 | 
| 
 | 
    25 dependencies and reference data, and how to configure the Galaxy
 | 
| 
 | 
    26 instance to detect the dependencies and reference data correctly
 | 
| 
 | 
    27 at run time.
 | 
| 
 | 
    28 
 | 
| 
 | 
    29 1. Install the tool from the toolshed
 | 
| 
 | 
    30 -------------------------------------
 | 
| 
 | 
    31 
 | 
| 
 | 
    32 The core tool is hosted on the Galaxy toolshed, so it can be installed
 | 
| 
 | 
    33 directly from there (this is the recommended route):
 | 
| 
 | 
    34 
 | 
| 
 | 
    35  * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
 | 
| 
 | 
    36 
 | 
| 
 | 
    37 Alternatively it can be installed manually; in this case there are two
 | 
| 
 | 
    38 files to install:
 | 
| 
 | 
    39 
 | 
| 
 | 
    40  * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
 | 
| 
 | 
    41  * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
 | 
| 
 | 
    42 
 | 
| 
 | 
    43 Put these in a directory that is visible to Galaxy (e.g. a
 | 
| 
 | 
    44 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
 | 
| 
 | 
    45 file to tell Galaxy to offer the tool by adding the line e.g.::
 | 
| 
 | 
    46 
 | 
| 
 | 
    47     <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
 | 
| 
 | 
    48 
 | 
| 
 | 
    49 2. Install the reference data
 | 
| 
 | 
    50 -----------------------------
 | 
| 
 | 
    51 
 | 
| 
 | 
    52 The script ``References.sh`` from the pipeline package at
 | 
| 
 | 
    53 https://github.com/MTutino/Amplicon_analysis can be run to install
 | 
| 
 | 
    54 the reference data, for example::
 | 
| 
 | 
    55 
 | 
| 
 | 
    56     cd /path/to/pipeline/data
 | 
| 
 | 
    57     wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
 | 
| 
 | 
    58     /bin/bash ./References.sh
 | 
| 
 | 
    59 
 | 
| 
 | 
    60 will install the data in ``/path/to/pipeline/data``.
 | 
| 
 | 
    61 
 | 
| 
 | 
    62 **NB** The final amount of data downloaded and uncompressed will be
 | 
| 
 | 
    63 around 9GB.
 | 
| 
 | 
    64 
 | 
| 
 | 
    65 3. Configure reference data location in Galaxy
 | 
| 
 | 
    66 ----------------------------------------------
 | 
| 
 | 
    67 
 | 
| 
 | 
    68 The final step is to make your Galaxy installation aware of the
 | 
| 
 | 
    69 location of the reference data, so it can locate them both when the
 | 
| 
 | 
    70 tool is run.
 | 
| 
 | 
    71 
 | 
| 
 | 
    72 The tool locates the reference data via an environment variable called
 | 
| 
 | 
    73 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
 | 
| 
 | 
    74 directory where the reference data has been installed.
 | 
| 
 | 
    75 
 | 
| 
 | 
    76 There are various ways to do this, depending on how your Galaxy
 | 
| 
 | 
    77 installation is configured:
 | 
| 
 | 
    78 
 | 
| 
 | 
    79  * **For local instances:** add a line to set it in the
 | 
| 
 | 
    80    ``config/local_env.sh`` file of your Galaxy installation (you
 | 
| 
 | 
    81    may need to create a new empty file first), e.g.::
 | 
| 
 | 
    82 
 | 
| 
 | 
    83        export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
 | 
| 
 | 
    84 
 | 
| 
 | 
    85  * **For production instances:** set the value in the ``job_conf.xml``
 | 
| 
 | 
    86    configuration file, e.g.::
 | 
| 
 | 
    87 
 | 
| 
 | 
    88        <destination id="amplicon_analysis">
 | 
| 
 | 
    89           <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
 | 
| 
 | 
    90        </destination>
 | 
| 
 | 
    91 
 | 
| 
 | 
    92    and then specify that the pipeline tool uses this destination::
 | 
| 
 | 
    93 
 | 
| 
 | 
    94        <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
 | 
| 
 | 
    95 
 | 
| 
 | 
    96    (For more about job destinations see the Galaxy documentation at
 | 
| 
 | 
    97    https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
 | 
| 
 | 
    98 
 | 
| 
 | 
    99 4. Enable rendering of HTML outputs from pipeline
 | 
| 
 | 
   100 -------------------------------------------------
 | 
| 
 | 
   101 
 | 
| 
 | 
   102 To ensure that HTML outputs are displayed correctly in Galaxy
 | 
| 
 | 
   103 (for example the Vsearch OTU table heatmaps), Galaxy needs to be
 | 
| 
 | 
   104 configured not to sanitize the outputs from the ``Amplicon_analysis``
 | 
| 
 | 
   105 tool.
 | 
| 
 | 
   106 
 | 
| 
 | 
   107 Either:
 | 
| 
 | 
   108 
 | 
| 
 | 
   109  * **For local instances:** set ``sanitize_all_html = False`` in
 | 
| 
 | 
   110    ``config/galaxy.ini`` (nb don't do this on production servers or
 | 
| 
 | 
   111    public instances!); or
 | 
| 
 | 
   112 
 | 
| 
 | 
   113  * **For production instances:** add the ``Amplicon_analysis`` tool
 | 
| 
 | 
   114    to the display whitelist in the Galaxy instance:
 | 
| 
 | 
   115 
 | 
| 
 | 
   116    - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
 | 
| 
 | 
   117      ``config/galaxy.ini`` and restart Galaxy;
 | 
| 
 | 
   118    - Go to ``Admin>Manage Display Whitelist``, check the box for
 | 
| 
 | 
   119      ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
 | 
| 
 | 
   120      search function to help locate it) and click on
 | 
| 
 | 
   121      ``Submit new whitelist`` to update the settings.
 | 
| 
 | 
   122 
 | 
| 
 | 
   123 Additional details
 | 
| 
 | 
   124 ==================
 | 
| 
 | 
   125 
 | 
| 
 | 
   126 Some other things to be aware of:
 | 
| 
 | 
   127 
 | 
| 
 | 
   128  * Note that using the Silva database requires a minimum of 18Gb RAM
 | 
| 
 | 
   129 
 | 
| 
 | 
   130 Known problems
 | 
| 
 | 
   131 ==============
 | 
| 
 | 
   132 
 | 
| 
 | 
   133  * Only the ``VSEARCH`` pipeline in Mauro's script is currently
 | 
| 
 | 
   134    available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
 | 
| 
 | 
   135    pipelines have yet to be implemented.
 | 
| 
 | 
   136  * The images in the tool help section are not visible if the
 | 
| 
 | 
   137    tool has been installed locally, or if it has been installed in
 | 
| 
 | 
   138    a Galaxy instance which is served from a subdirectory.
 | 
| 
 | 
   139 
 | 
| 
 | 
   140    These are both problems with Galaxy and not the tool, see
 | 
| 
 | 
   141    https://github.com/galaxyproject/galaxy/issues/4490 and
 | 
| 
 | 
   142    https://github.com/galaxyproject/galaxy/issues/1676
 | 
| 
 | 
   143 
 | 
| 
 | 
   144 Appendix: installing the dependencies manually
 | 
| 
 | 
   145 ==============================================
 | 
| 
 | 
   146 
 | 
| 
 | 
   147 If the tool is installed from the Galaxy toolshed (recommended) then
 | 
| 
 | 
   148 the dependencies should be installed automatically and this step can
 | 
| 
 | 
   149 be skipped.
 | 
| 
 | 
   150 
 | 
| 
 | 
   151 Otherwise the ``install_amplicon_analysis_deps.sh`` script can be used
 | 
| 
 | 
   152 to fetch and install the dependencies locally, for example::
 | 
| 
 | 
   153 
 | 
| 
 | 
   154     install_amplicon_analysis.sh /path/to/local_tool_dependencies
 | 
| 
 | 
   155 
 | 
| 
 | 
   156 (This is the same script as is used to install dependencies from the
 | 
| 
 | 
   157 toolshed.) This can take some time to complete, and when completed will
 | 
| 
 | 
   158 have created a directory called ``Amplicon_analysis-1.2.3`` containing
 | 
| 
 | 
   159 the dependencies under the specified top level directory.
 | 
| 
 | 
   160 
 | 
| 
 | 
   161 **NB** The installed dependencies will occupy around 2.6G of disk
 | 
| 
 | 
   162 space.
 | 
| 
 | 
   163 
 | 
| 
 | 
   164 You will need to make sure that the ``bin`` subdirectory of this
 | 
| 
 | 
   165 directory is on Galaxy's ``PATH`` at runtime, for the tool to be able
 | 
| 
 | 
   166 to access the dependencies - for example by adding a line to the
 | 
| 
 | 
   167 ``local_env.sh`` file like::
 | 
| 
 | 
   168 
 | 
| 
 | 
   169     export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
 | 
| 
 | 
   170 
 | 
| 
 | 
   171 History
 | 
| 
 | 
   172 =======
 | 
| 
 | 
   173 
 | 
| 
 | 
   174 ========== ======================================================================
 | 
| 
 | 
   175 Version    Changes
 | 
| 
 | 
   176 ---------- ----------------------------------------------------------------------
 | 
| 
 | 
   177 1.3.5.0    Updated to Amplicon_Analysis_Pipeline version 1.3.5.
 | 
| 
 | 
   178 1.2.3.0    Updated to Amplicon_Analysis_Pipeline version 1.2.3; install
 | 
| 
 | 
   179            dependencies via tool_dependencies.xml.
 | 
| 
 | 
   180 1.2.2.0    Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes
 | 
| 
 | 
   181            jackknifed analysis which is not captured by Galaxy tool)
 | 
| 
 | 
   182 1.2.1.0    Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds
 | 
| 
 | 
   183            option to use the Human Oral Microbiome Database v15.1, and
 | 
| 
 | 
   184            updates SILVA database to v123)
 | 
| 
 | 
   185 1.1.0      First official version on Galaxy toolshed.
 | 
| 
 | 
   186 1.0.6      Expand inline documentation to provide detailed usage guidance.
 | 
| 
 | 
   187 1.0.5      Updates including:
 | 
| 
 | 
   188 
 | 
| 
 | 
   189            - Capture read counts from quality control as new output dataset
 | 
| 
 | 
   190            - Capture FastQC per-base quality boxplots for each sample as
 | 
| 
 | 
   191              new output dataset
 | 
| 
 | 
   192            - Add support for -l option (sliding window length for trimming)
 | 
| 
 | 
   193            - Default for -L set to "200"
 | 
| 
 | 
   194 1.0.4      Various updates:
 | 
| 
 | 
   195 
 | 
| 
 | 
   196 	   - Additional outputs are captured when a "Categories" file is
 | 
| 
 | 
   197 	     supplied (alpha diversity rarefaction curves and boxplots)
 | 
| 
 | 
   198 	   - Sample names derived from Fastqs in a collection of pairs
 | 
| 
 | 
   199 	     are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
 | 
| 
 | 
   200            - Input Fastqs can now be of more general ``fastq`` type
 | 
| 
 | 
   201 	   - Log file outputs are captured in new output dataset
 | 
| 
 | 
   202 	   - User can specify a "title" for the job which is copied into
 | 
| 
 | 
   203 	     the dataset names (to distinguish outputs from different runs)
 | 
| 
 | 
   204 	   - Improved detection and reporting of problems with input
 | 
| 
 | 
   205 	     Metatable
 | 
| 
 | 
   206 1.0.3      Take the sample names from the collection dataset names when
 | 
| 
 | 
   207            using collection as input (this is now the default input mode);
 | 
| 
 | 
   208            collect additional output dataset; disable ``usearch``-based
 | 
| 
 | 
   209            pipelines (i.e. ``UPARSE`` and ``QIIME``).
 | 
| 
 | 
   210 1.0.2      Enable support for FASTQs supplied via dataset collections and
 | 
| 
 | 
   211            fix some broken output datasets.
 | 
| 
 | 
   212 1.0.1      Initial version
 | 
| 
 | 
   213 ========== ======================================================================
 |