comparison readme.rst @ 3:d6553277b759 draft

Uploaded
author rnateam
date Tue, 21 Jan 2014 04:57:28 -0500
parents ba161910b46f
children
comparison
equal deleted inserted replaced
2:e9b2400cc569 3:d6553277b759
1
2
1 This package is a Galaxy workflow for BlockClust pipeline. 3 This package is a Galaxy workflow for BlockClust pipeline.
2 4
3 It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of 5
4 genes to generate gene predictions on a new genome, and then calls EMBOSS 6 ======
5 (Rice et al. 2000) to translate the predictions into a FASTA file of 7 Galaxy
6 predicted protein sequences. The workflow requires two input files: 8 ======
7 9
8 * Nucleotide FASTA file of know gene sequences (training set) 10 `Galaxy <http://galaxyproject.org/>`_ is an open, web-based platform for data intensive research.
9 * Nucleotide FASTA file of genome sequence or assembled contigs 11 All tools can be combined in workflows without any need of programming skills.
10 12 Furthermore the platform can be extended with more tools at any time.
11 First an interpolated context model (ICM) is built from the set of known 13 Each tool has its own information about what it does and how the input is supposed to look like.
12 genes, preferably from the closest relative organism(s) available. Next this 14 You can make data available for Galaxy by uploading local files or downloading online content.
13 ICM model is used to predict genes on the genomic FASTA file. This produces 15 Inputfiles, workflowsteps and results are stored in a history where you can view them or reaccess them later.
14 a FASTA file of the predicted gene nucleotide sequences, which is translated 16 It is possible to share workflows and histories with other users or make the public available.
15 into protein sequences using the EMBOSS tool transeq. 17 Saved workflows can be used with new input files or just to rerun an analyses which ensures repeatability.
16 18
17 Glimmer is intended for finding genes in microbial DNA, especially bacteria, 19
18 archaea, and viruses. 20
19 21 Getting Started
20 See http://www.galaxyproject.org for information about the Galaxy Project. 22 ===============
23
24 BlockClust can be installed on all common Unix systems.
25 However, it is developed on Linux and I don't have access to OS X. You are welcome to help improving this documentation, just contact_ me.
26
27 For any additional information, especially cluster configuration or general Galaxy_ questions,
28 please have a look at the Galaxy Wiki.
29
30 - http://wiki.galaxyproject.org/
31
32 - http://wiki.galaxyproject.org/Admin/
33
34 - http://galaxyproject.org/search/web/
35
36 .. _contact: https://github.com/bgruening
37 .. _Galaxy: http://galaxyproject.org/
38
39 Prerequisites::
40
41 * Python 2.6 or 2.7
42 * standard C compiler, C++ and Fortran compiler
43 * Autotools
44 * CMake
45 * cairo development files (used for PNG depictions)
46 * python development files
47 * Java Runtime Environment (JRE, used by OPSIN and NPLS)
48
49 To install all of the prerequisites you can run the following command, depending on your OS:
50
51 - Debian based systems: apt-get install build-essential gfortran cmake mercurial libcairo2-dev python-dev
52 - Fedora: yum install make automake gcc gcc-c++ gcc-gfortran cmake mercurial libcairo2-devel python-devel
53 - OS X (MacPorts_): port install gcc cmake automake mercurial cairo-devel
54
55 .. _MacPorts: http://www.macports.org/
56
57
58 ===================
59 Galaxy installation
60 ===================
61
62
63 0. Create a sand-boxed Python using virtualenv_ (not necessary but recommended)::
64
65 wget https://raw.github.com/pypa/virtualenv/master/virtualenv.py
66 python ./virtualenv.py --no-site-packages galaxy_env
67 . ./galaxy_env/bin/activate
68
69 .. _virtualenv: http://www.virtualenv.org/
70
71
72 1. Clone the latest `Galaxy platform`_::
73
74 hg clone https://bitbucket.org/galaxy/galaxy-central/
75
76 .. _Galaxy platform: http://wiki.galaxyproject.org/Admin/Get%20Galaxy
77
78 2. Navigate to the galaxy-central folder and update it::
79
80 cd ~/galaxy-central
81 hg pull
82 hg update
83
84 This step is not necessary if you have a fresh checkout. Anyway, it is good to know ;)
85
86 3. Create folders for toolshed and dependencies::
87
88 mkdir ~/shed_tools
89 mkdir ~/galaxy-central/tool_deps
90
91 4. Create configuration file::
92
93 cp ~/galaxy-central/universe_wsgi.ini.sample ~/galaxy-central/universe_wsgi.ini
94
95 5. Open universe_wsgi.ini and change the dependencies directory::
96
97 LINUX: gedit ~/galaxy-central/universe_wsgi.ini
98 OS X: open -a TextEdit ~/galaxy-central/universe_wsgi.ini
99
100 6. Search for ``tool_dependency_dir = None`` and change it to ``tool_dependency_dir = ./tool_deps``, remove the ``#`` if needed
101
102 7. Remove the ``#`` in front of ``tool_config_file`` and ``tool_path``
103
104 8. (Re-)Start the galaxy daemon::
105
106 sh run.sh --reload
107
108 In deamon mode all logs will be written to main.log in your Galaxy Home directory. You can also use::
109
110 run.sh
111
112 During the first startup Galaxy will prepare your database. That can take some time. Have a look at the log file if you want to know what happens.
113
114 After launching galaxy is accessible via the browser at ``http://localhost:8080/``.
115
116
117
118 =======================
119 Tool Shed configuration
120 =======================
121
122 - Register a new user account in your Galaxy instance: Top Panel → User → Register
123 - Become an admin
124 - open ``universe_wsgi.ini`` in your favourite text editor (gedit universe_wsgi.ini)
125 - search ``admin_users = None`` and change it to ``admin_users = EMAIL_ADDRESS`` (your Galaxy Username)
126 - remove the ``#`` if needed
127 - restart Galaxy
128
129 ::
130
131 sh run.sh --reload
132
133
134 =======================
135 BlockClust installation
136 =======================
137
138 BlockClust will automatically download and compile all requirements,
139 like EDeN, samtools and so on. It can take up to 1-2 hours.
140
141
142 Installation via Galaxy API (recommended)
143 =========================================
144
145 - Generate an `API Key`_
146 - Run the installation script::
147
148 python ./scripts/api/install_tool_shed_repositories.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow --tool-deps --repository-deps --panel-section-name ChemicalToolBoX
149
150 The -r argument specifies the version of ChemicalToolBoX. You can get the latest revsion number from the
151 `test tool shed`_ or with the following command::
152
153 hg identify http://toolshed.g2.bx.psu.edu/repos/bgruening/chemicaltoolbox
154
155 You can watch the installation status under: Top Panel → Admin → Manage installed tool shed repositories
156
157
158 .. _API Key: http://wiki.galaxyproject.org/Admin/API#Generate_the_Admin_Account_API_Key
159 .. _`test tool shed`: http://testtoolshed.g2.bx.psu.edu/
160
161
162 Installation via webbrowser
163 ===========================
164
165 - go to the `admin page`_
166 - select *Search and browse tool sheds*
167 - Galaxy test tool shed > Sequence Analysis > blockclust_workflow
168 - install chemicaltoolbox
169
170 .. _admin page: http://localhost:8080/admin
171
172
173
174 ===============
175 Troubleshooting
176 ===============
177
178 If you have any trouble or the installation did not finish properly, do not hesitate to contact me. However, if the
179 installation fails during the Galaxy installation, you can have a look at the `Galaxy wiki`_. If the ChemicalToolBoX installation fails,
180 you can try to run::
181
182 python ./scripts/api/repair_tool_shed_repository.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow
183
184 That will rerun all failed installation routines. Alternatively, you can navigate to the ChemicalToolBoX repository in
185 your browser and repair manually:
186 Top Panel → Admin → Manage installed tool shed repositories → chemicaltoolbox → Repository Actions → Repair repository
187
188 ------
189
190
191 On slow computers and during the compilation of large software libraries, like R,
192 the Tool Shed can run into a timeout and kills the installation.
193 That problem is known and should be fixed in the near future.
194
195 If you encouter a timeout or 'hung' during the installation you can increase the ``threadpool_kill_thread_limit`` in your universe_wsgi.ini file.
196
197
198 ------
199
200 **Database locking errors**
201
202 Please note that Galaxy per default uses a SQLite database. Sqlite is not intended for production use.
203 With multiple users or complex components, like that workflow, you will see database locking errors.
204 We highly recommend to use PostgreSQL for any kind of production system.
205
206
207 .. _Galaxy wiki: http://wiki.galaxyproject.org/
208
209
210 Workflows
211 =========
212
213 An example workflow is located in the `Tool Shed`::
214
215 http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow
216
217 You can install the workflow with the API::
218
219 python ./scripts/api/install_tool_shed_repositories.py --api YOUR_API_KEY -l http://localhost:8080 --url http://toolshed.g2.bx.psu.edu/ -o rnateam -r e9b2400cc569 --name blockclust_workflow --tool-deps --repository-deps --panel-section-name BlockClust
220
221 or as described above via webbrowser. You have now successfully installed the workflow,
222 to import it to all your users you need to go to the admin panel, choose the worklow and import it.
223 For more information have a look at the Galaxy wiki::
224
225 http://wiki.galaxyproject.org/ToolShedWorkflowSharing#Finding_workflows_in_tool_shed_repositories
226
227 Please **note** that Galaxy per default uses a SQLite database. Sqlite is not intended for production use.
228 With multiple users or complex components, like that workflow, you will see database locking errors.
229 We highly recommend to use PostgreSQL for any kind of production system.
230
21 231
22 232
23 Sample Data 233 Sample Data
24 =========== 234 ===========
25 235
26 As an example, we will use the first public assembly of the 2011 Shiga-toxin
27 producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the
28 open-source crowd-sourcing analysis described in Rohde et al. (2011) and here:
29 https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
30
31 You can upload this assembly directly into Galaxy using the "Upload File" tool
32 with either of these URLs - Galaxy should recognise this is a FASTA file with
33 3,057 sequences:
34
35 * http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt
36 * https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt
37
38 This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled
39 by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the
40 MIRA 3.2 assembler. It was initially released via his blog,
41 http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/
42
43 We will also need a training set of known *E. coli* genes, for example the
44 model strain *Escherichia coli* str. K-12 substr. MG1655 which is well
45 annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the
46 gene nucleotide sequences directly into Galaxy via this URL, which Galaxy
47 should recognise as a FASTA file with 4,321 sequences:
48
49 * ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn
50
51 Then run the workflow, which should produce 2,333 predicted genes for the
52 TY2482 assembly (two FASTA files, nucleotide and protein sequences).
53 236
54 237
55 Citation 238 Citation
56 ======== 239 ========
57 240
59 wrappers for Galaxy, in work leading to a scientific publication, 242 wrappers for Galaxy, in work leading to a scientific publication,
60 please cite: 243 please cite:
61 244
62 P. Videm at al... 245 P. Videm at al...
63 246
64 For Glimmer3 please cite:
65
66 Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007)
67 Identifying bacterial genes and endosymbiont DNA with Glimmer.
68 Bioinformatics 23(6), 673-679.
69 http://dx.doi.org/10.1093/bioinformatics/btm009
70
71 For EMBOSS please cite:
72
73 Rice, P., Longden, I. and Bleasby, A. (2000)
74 EMBOSS: The European Molecular Biology Open Software Suite
75 Trends in Genetics 16(6), 276-277.
76 http://dx.doi.org/10.1016/S0168-9525(00)02024-2
77 247
78 248
79 Additional References 249 Additional References
80 ===================== 250 =====================
81 251
82 Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011)
83 Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4.
84 New England Journal of Medicine 365, 718-724.
85 http://dx.doi.org/10.1056/NEJMoa1107643
86 252
87 253
88 Availability 254 Availability
89 ============ 255 ============
90 256
91 This workflow is available on the main Galaxy Tool Shed: 257 This workflow is available on the main Galaxy Tool Shed:
92 258
93 http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow 259 http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow
94 260
95 Development is being done on github: 261 Development is being done on github:
96 262
97 https://github.com/bgruening/galaxytools/workflows/glimmer3/ 263 https://github.com/bgruening/galaxytools/tree/master/workflows/blockclust
98 264
99 265
100 Dependencies 266 Dependencies
101 ============ 267 ============
102 268
103 These dependencies should be resolved automatically via the Galaxy Tool Shed: 269 These dependencies should be resolved automatically via the Galaxy Tool Shed:
104 270
105 * http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3 271 * http://testtoolshed.g2.bx.psu.edu/view/iuc/package_samtools_0_1_19
106 * http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5 272 * http://testtoolshed.g2.bx.psu.edu/view/iuc/package_r_3_0_1
273 * http://testtoolshed.g2.bx.psu.edu/view/rnateam/package_segemehl_0_1_6
274 * http://testtoolshed.g2.bx.psu.edu/view/iuc/msa_datatypes
275 * http://testtoolshed.g2.bx.psu.edu/view/iuc/package_infernal_1_1rc4
276 * http://testtoolshed.g2.bx.psu.edu/view/rnateam/blockbuster
277 * http://testtoolshed.g2.bx.psu.edu/view/bgruening/package_eden_1_1
278 * http://testtoolshed.g2.bx.psu.edu/view/iuc/package_mcl_12_135