comparison README.txt @ 0:b6211faea403 draft

planemo upload for repository https://github.com/mvdbeek/docker_scriptrunner/ commit ae672027942a606c1a5e302348279a5493151c11-dirty
author mvdbeek
date Fri, 08 Jul 2016 15:09:10 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:b6211faea403
1 # WARNING before you start
2 # Carefully inspect tool usage. If bugs are found within the tool, users may be able to break
3 # out of the container and mount files on the host system.
4
5 This is a fork of toolfactory that makes use of Docker to sandbox the generated script.
6 As such you need to have the system user under which galaxy tools are executed be able to run Docker.
7 On Ubuntu you can do this by adding your galaxy user to the docker group (http://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo).
8 Assuming galaxy runs as the user galaxy, this is the short form for installing Docker from the official docker Ubuntu Trusty repository:
9
10 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
11 sudo sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
12 sudo apt-get update
13 sudo apt-get install lxc-docker
14 sudo gpasswd -a galaxy docker
15 sudo service docker restart
16
17 Eventually the galaxy process might need to be restarted.
18
19 On OSX, you need to boot2docker installed and available to the galaxy user.
20
21 Note that this could bring severe security problems in case untrusted users can become this user.
22 If you want to use this tool, read and understand the following article:
23 https://docs.docker.com/articles/security/#docker-daemon-attack-surface
24
25 Work is ongoing, some important features are missing, like being able to manage containers.
26 Currently, only a single container with pre-installed tools is available.
27
28 This is an beta-stage, potentially dangerous tool.
29
30 Please cite:
31 - http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
32 - van den Beek M., Antoniewski C., in preparation
33 if you use this tool in your published work.
34
35 *Short Story*
36
37 This is an unusual Galaxy tool that exposes unrestricted scripting to users of a Galaxy server,
38 allowing them to run scripts in R, python, sh and perl over input datasets,
39 writing a single new data set as output.
40
41 In addition, this tool optionally generates very simple new Galaxy tools, that effectively
42 freeze the supplied script into a new, ordinary Galaxy tool that runs it over one or more input files,
43 working just like any other Galaxy tool for your users.
44
45 To use the ToolFactory, you should have prepared a script to paste into a text box,
46 and a small test input example ready to select from your history to test your new script.
47 There is an example in each scripting language on the Tool Factory form. You can just
48 cut and paste these to try it out - remember to select the right interpreter please. You'll
49 also need to create a small test data set using the Galaxy history add new data tool.
50
51 If the script fails somehow, use the "redo" button on the tool output in your history to
52 recreate the form complete with broken script. Fix the bug and execute again. Rinse, wash, repeat.
53
54 Once the script runs sucessfully, a new Galaxy tool that runs your script can be generated.
55 Select the "generate" option and supply some help text and names. The new tool will be
56 generated in the form of a new Galaxy datatype - toolshed.gz - as the name suggests,
57 it's an archive ready to upload to a Galaxy ToolShed as a new tool repository.
58
59 Once it's in a ToolShed, it can be installed into any local Galaxy server from
60 the Galaxy administrative interface.
61
62 Once the new tool is installed, local users can run it - each time, the script that was supplied
63 when it was built will be executed with the input chosen from the user's history. In other words,
64 the tools you generate with the ToolFactory run just like any other Galaxy tool,
65 but run your script every time.
66
67 Tool factory tools are perfect for workflow components. One input, one output, no variables.
68
69 *Reasons to read further*
70
71 If you use Galaxy to support your research;
72
73 You and fellow users are sometimes forced to take data out of Galaxy, process it with ugly
74 little perl/awk/sed/R... scripts and put it back;
75
76 You do this when you can't do some transformation in Galaxy (the 90/10 rule);
77
78 You don't have enough developer resources for wrapping dozens of even relatively simple tools;
79
80 Your research and your institution would be far better off if those feral scripts were all tucked
81 safely in your local toolshed and Galaxy histories.
82
83 *The good news* If it can be trivially scripted, it can be running safely in your
84 local Galaxy via your own local toolshed in a few minutes - with functional tests.
85
86
87 *Value proposition* The ToolFactory allows Galaxy to efficiently take over most of your lab's
88 dark script matter, making it reproducible in Galaxy and shareable through the ToolShed.
89
90 That's what this tool does. You paste a simple script and the tool returns
91 a new, real Galaxy tool, ready to be installed from the local toolshed to local servers.
92 Scripts can be wrapped and online literally within minutes.
93
94 *To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed,
95 you should be a developer installing this tool on a private/personal/scratch local instance where you
96 are an admin_user. Then, if you break it, you get to keep all the pieces
97 see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
98
99 ** Installation **
100 This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link.
101 Find the Galaxy Test toolshed (not main) and search for the toolfactory repository.
102 Open it and review the code and select the option to install it.
103
104 If you can't get the tool that way, the xml and py files here need to be copied into a new tools
105 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry pointing to the xml
106 file - something like::
107
108 <section name="Tool building tools" id="toolbuilders">
109 <tool file="DockerToolFactory.xml"/>
110 </section>
111
112 If not already there (I just added it to datatypes_conf.xml.sample), please add:
113 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
114 to your local data_types_conf.xml.
115
116 Ensure that html sanitization is set to False and uncommented in universe_wsgi.ini
117
118 You'll have to restart the server for the new tool to be available.
119
120 R, python, perl are preloaded in the supplied Dockerfile.
121 Upon first execution the Dockerfile will be used to build an image
122 with varius pre-installed tools.
123 Adding new ones should be easy enough, and follows standard conventions
124 for Docker tools.
125 Please make suggestions as bitbucket issues and code.
126 The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick .
127
128 *What it does* This is a tool factory for simple scripts in python, R and perl currently.
129 Functional tests are automatically generated.
130 On a technical level, a Docker container is started, and input and output files
131 are made available to the container.
132 After running, the docker container will be terminated.
133
134 LIMITED to simple scripts that read inputs from the history.
135 Optionally can write one new history dataset, and optionally collect any number of outputs into links on an autogenerated HTML
136 index page for the user to navigate - useful if the script writes images and output files - pdf outputs
137 are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to
138 be avaailable.
139
140 Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since
141 a generated script gets you a serious leg up to a more complex one.
142
143 *What you do* You paste and run your script
144 you fix the syntax errors and eventually it runs
145 You can use the redo button and edit the script before
146 trying to rerun it as you debug - it works pretty well.
147
148 Once the script works on some test data, you can
149 generate a toolshed compatible gzip file
150 containing your script ready to run as an ordinary Galaxy tool in a
151 repository on your local toolshed. That means safe and largely automated installation in any
152 production Galaxy configured to use your toolshed.
153
154 *Generated tool Security* Once you install a generated tool, it's just
155 another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure
156 but please, practice safe toolshed.
157 Read the fucking code before you install any tool.
158 Especially this one - it is really scary.
159
160 If you opt for an HTML output, you get all the script outputs arranged
161 as a single Html history item - all output files are linked, thumbnails for all the pdfs.
162 Ugly but really inexpensive.
163
164 Patches and suggestions welcome as bitbucket issues please?
165
166
167 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
168
169 all rights reserved
170 Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
171
172 Material for our more enthusiastic and voracious readers continues below - we salute you.
173
174 **Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs
175 - even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible.
176
177 **Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics
178 tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a
179 tabular file) and takes parameters the way Galaxy supplies them (see example below), they:
180
181 1. Install the tool factory on a personal private instance
182
183 2. Upload a small test data set
184
185 3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right -
186 there is absolutely no reason to do this anywhere other than on a personal private instance.
187
188 4. Once it works right, set the 'Generate toolshed gzip' option and run it again.
189
190 5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry.
191
192 6. Upload the new tool to the toolshed
193
194 7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy
195
196 **Simple examples on the tool form**
197
198 A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file,
199 does something (transpose in this case) and writes the results to a new tabular file::
200
201 # transpose a tabular input file and write as a tabular output file
202 ourargs = commandArgs(TRUE)
203 inf = ourargs[1]
204 outf = ourargs[2]
205 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
206 outp = t(inp)
207 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
208
209 Calculate a multiple test adjusted p value from a column of p values - for this script to be useful,
210 it needs the right column for the input to be specified in the code for the
211 given input file type(s) specified when the tool is generated ::
212
213 # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use
214 column = 1 # adjust if necessary for some other kind of input
215 fdrmeth = 'BH'
216 ourargs = commandArgs(TRUE)
217 inf = ourargs[1]
218 outf = ourargs[2]
219 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
220 p = inp[,column]
221 q = p.adjust(p,method=fdrmeth)
222 newval = paste(fdrmeth,'p-value',sep='_')
223 q = data.frame(q)
224 names(q) = newval
225 outp = cbind(inp,newval=q)
226 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
227
228
229
230 Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is
231 turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page::
232
233 # note this script takes NO input or output because it generates random data
234 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100))
235 bar = as.matrix(foo)
236 pdf( "heattest.pdf" )
237 heatmap(bar,main='Random Heatmap')
238 dev.off()
239
240 A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut
241 and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the
242 correct number of columns in reverse order,but this script will work for any number of columns so is completely generic::
243
244 # reverse order of columns in a tabular file
245 import sys
246 inp = sys.argv[1]
247 outp = sys.argv[2]
248 i = open(inp,'r')
249 o = open(outp,'w')
250 for row in i:
251 rs = row.rstrip().split('\t')
252 rs.reverse()
253 o.write('\t'.join(rs))
254 o.write('\n')
255 i.close()
256 o.close()
257
258
259 Galaxy as an IDE for developing API scripts
260 If you need to develop Galaxy API scripts and you like to live dangerously, please read on.
261
262 Galaxy as an IDE?
263 Amazingly enough, blend-lib API scripts run perfectly well *inside* Galaxy when pasted into a Tool Factory form. No need to generate a new tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, it is actually quite useable.
264
265 Why bother - what's wrong with Eclipse
266 Nothing. But, compared with developing API scripts in the usual way outside Galaxy, you get persistence and other framework benefits plus at absolutely no extra charge, a ginormous security problem if you share the history or any outputs because they contain the api script with key so development servers only please!
267
268 Workflow
269 Fire up the Tool Factory in Galaxy.
270
271 Leave the input box empty, set the interpreter to python, paste and run an api script - eg working example (substitute the url and key) below.
272
273 It took me a few iterations to develop the example below because I know almost nothing about the API. I started with very simple code from one of the samples and after each run, the (edited..) api script is conveniently recreated using the redo button on the history output item. So each successive version of the developing api script you run is persisted - ready to be edited and rerun easily. It is ''very'' handy to be able to add a line of code to the script and run it, then view the output to (eg) inspect dicts returned by API calls to help move progressively deeper iteratively.
274
275 Give the below a whirl on a private clone (install the tool factory from the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
276
277 Eg tool factory api script
278 import sys
279 from blend.galaxy import GalaxyInstance
280 ourGal = 'http://x.x.x.x:xxxx'
281 ourKey = 'xxx'
282 gi = GalaxyInstance(ourGal, key=ourKey)
283 libs = gi.libraries.get_libraries()
284 res = []
285 # libs looks like
286 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
287 for lib in libs:
288 res.append('%s:\n' % lib['name'])
289 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
290 outf=open(sys.argv[2],'w')
291 outf.write('\n'.join(res))
292 outf.close()
293
294 **Attribution**
295 Creating re-usable tools from scripts: The Galaxy Tool Factory
296 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
297 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
298
299 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
300
301 **Licensing**
302 Copyright Ross Lazarus 2010
303 ross lazarus at g mail period com
304
305 All rights reserved.
306
307 Licensed under the LGPL
308
309 **Obligatory screenshot**
310
311 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png