Mercurial > repos > iuc > tool_factory_2
comparison README.txt @ 0:2ac21c27018a draft default tip
Uploaded
| author | iuc |
|---|---|
| date | Tue, 07 Apr 2015 10:29:55 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:2ac21c27018a |
|---|---|
| 1 # WARNING before you start | |
| 2 # Install this tool on a private Galaxy ONLY | |
| 3 # Please NEVER on a public or production instance | |
| 4 # updated august 2014 by John Chilton adding citation support | |
| 5 # | |
| 6 # updated august 8 2014 to fix bugs reported by Marius van den Beek | |
| 7 # please cite the resource at | |
| 8 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
| 9 # if you use this tool in your published work. | |
| 10 | |
| 11 *Short Story* | |
| 12 | |
| 13 This is an unusual Galaxy tool capable of generating new Galaxy tools. | |
| 14 It works by exposing *unrestricted* and therefore extremely dangerous scripting | |
| 15 to all designated administrators of the host Galaxy server, allowing them to | |
| 16 run scripts in R, python, sh and perl over multiple selected input data sets, | |
| 17 writing a single new data set as output. | |
| 18 | |
| 19 *Differences between TF2 and the original Tool Factory* | |
| 20 | |
| 21 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined | |
| 22 for the new tool. If these are editable, the user can change them but otherwise, they are passed | |
| 23 as fixed and invisible parameters for each execution. Obviously, there are substantial security | |
| 24 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt | |
| 25 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__" | |
| 26 into "<" for certain parameters where that is needed. Please practise safe toolshed. | |
| 27 | |
| 28 2. Any number of (the same datatype) of input files may be defined. | |
| 29 | |
| 30 These changes substantially complicate the way your supplied script is supplied with | |
| 31 all the new and variable parameters. Examples in each scripting language are shown | |
| 32 in the tool help | |
| 33 | |
| 34 *Automated outputs in named sections* | |
| 35 | |
| 36 If your script writes to the current directory path, arbitrary mix of (eg) | |
| 37 pdfs, tabular analysis results and run logs,the tool factory can optionally | |
| 38 auto-generate a linked Html page with separate sections showing a thumbnail | |
| 39 grid for all pdfs and the log text, grouping all artifacts sharing a file | |
| 40 name and log name prefix:: | |
| 41 | |
| 42 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will | |
| 43 all be grouped together - eg | |
| 44 foo_baz.pdf | |
| 45 foo_bar.pdf and | |
| 46 foo_zot.xls | |
| 47 would all be displayed and linked in the same section with foo.log's contents | |
| 48 - to form the "Foo" section of the Html page. Sections appear in alphabetic | |
| 49 order and there are no limits on the number of files or sections. | |
| 50 | |
| 51 *Automated generation of new Galaxy tools for installation into any Galaxy* | |
| 52 | |
| 53 Once a script is working correctly, this tool optionally generates a | |
| 54 new Galaxy tool, effectively freezing the supplied script into a new, | |
| 55 ordinary Galaxy tool that runs it over one or more input files selected by | |
| 56 the user. Generated tools are installed via a tool shed by an administrator | |
| 57 and work exactly like all other Galaxy tools for your users. | |
| 58 | |
| 59 If you use the Html output option, please ensure that sanitize_all_html is | |
| 60 set to False and uncommented in universe_wsgi.ini - it should show:: | |
| 61 | |
| 62 # By default, all tool output served as 'text/html' will be sanitized | |
| 63 sanitize_all_html = False | |
| 64 | |
| 65 This opens potential security risks and may not be acceptable for public | |
| 66 sites where the lack of stylesheets may make Html pages damage onlookers' | |
| 67 eyeballs but should still be correct. | |
| 68 | |
| 69 | |
| 70 *More Detail* | |
| 71 | |
| 72 To use the ToolFactory, you should have prepared a script to paste into a | |
| 73 text box, and a small test input example ready to select from your history | |
| 74 to test your new script. | |
| 75 | |
| 76 There is an example in each scripting language on the Tool Factory form. You | |
| 77 can just cut and paste these to try it out - remember to select the right | |
| 78 interpreter please. You'll also need to create a small test data set using | |
| 79 the Galaxy history add new data tool. | |
| 80 | |
| 81 If the script fails somehow, use the "redo" button on the tool output in | |
| 82 your history to recreate the form complete with broken script. Fix the bug | |
| 83 and execute again. Rinse, wash, repeat. | |
| 84 | |
| 85 Once the script runs sucessfully, a new Galaxy tool that runs your script | |
| 86 can be generated. Select the "generate" option and supply some help text and | |
| 87 names. The new tool will be generated in the form of a new Galaxy datatype | |
| 88 - toolshed.gz - as the name suggests, it's an archive ready to upload to a | |
| 89 Galaxy ToolShed as a new tool repository. | |
| 90 | |
| 91 Once it's in a ToolShed, it can be installed into any local Galaxy server | |
| 92 from the server administrative interface. | |
| 93 | |
| 94 Once the new tool is installed, local users can run it - each time, the script | |
| 95 that was supplied when it was built will be executed with the input chosen | |
| 96 from the user's history. In other words, the tools you generate with the | |
| 97 ToolFactory run just like any other Galaxy tool,but run your script every time. | |
| 98 | |
| 99 Tool factory tools are perfect for workflow components. One input, one output, | |
| 100 no variables. | |
| 101 | |
| 102 *To fully and safely exploit the awesome power* of this tool, | |
| 103 Galaxy and the ToolShed, you should be a developer installing this | |
| 104 tool on a private/personal/scratch local instance where you are an | |
| 105 admin_user. Then, if you break it, you get to keep all the pieces see | |
| 106 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | |
| 107 | |
| 108 ** Installation ** | |
| 109 This is a Galaxy tool. You can install it most conveniently using the | |
| 110 administrative "Search and browse tool sheds" link. Find the Galaxy Main | |
| 111 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory | |
| 112 repository. Open it and review the code and select the option to install it. | |
| 113 | |
| 114 ( | |
| 115 If you can't get the tool that way, the xml and py files here need to be | |
| 116 copied into a new tools | |
| 117 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry | |
| 118 pointing to the xml | |
| 119 file - something like:: | |
| 120 | |
| 121 <section name="Tool building tools" id="toolbuilders"> | |
| 122 <tool file="toolfactory/rgToolFactory.xml"/> | |
| 123 </section> | |
| 124 | |
| 125 If not already there (I just added it to datatypes_conf.xml.sample), | |
| 126 please add: | |
| 127 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" | |
| 128 mimetype="multipart/x-gzip" subclass="True" /> | |
| 129 to your local data_types_conf.xml. | |
| 130 ) | |
| 131 | |
| 132 Of course, R, python, perl etc are needed on your path if you want to test | |
| 133 scripts using those interpreters. Adding new ones to this tool code should | |
| 134 be easy enough. Please make suggestions as bitbucket issues and code. The | |
| 135 HTML file code automatically shrinks R's bloated pdfs, and depends on | |
| 136 ghostscript. The thumbnails require imagemagick . | |
| 137 | |
| 138 * Restricted execution * | |
| 139 The tool factory tool itself will then be usable ONLY by admin users - | |
| 140 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY | |
| 141 admin_users can run this tool** Think about it for a moment. If allowed to | |
| 142 run any arbitrary script on your Galaxy server, the only thing that would | |
| 143 impede a miscreant bent on destroying all your Galaxy data would probably | |
| 144 be lack of appropriate technical skills. | |
| 145 | |
| 146 *What it does* This is a tool factory for simple scripts in python, R and | |
| 147 perl currently. Functional tests are automatically generated. How cool is that. | |
| 148 | |
| 149 LIMITED to simple scripts that read one input from the history. Optionally can | |
| 150 write one new history dataset, and optionally collect any number of outputs | |
| 151 into links on an autogenerated HTML index page for the user to navigate - | |
| 152 useful if the script writes images and output files - pdf outputs are shown | |
| 153 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and | |
| 154 imagemagik need to be available. | |
| 155 | |
| 156 Generated tools can be edited and enhanced like any Galaxy tool, so start | |
| 157 small and build up since a generated script gets you a serious leg up to a | |
| 158 more complex one. | |
| 159 | |
| 160 *What you do* You paste and run your script, you fix the syntax errors and | |
| 161 eventually it runs. You can use the redo button and edit the script before | |
| 162 trying to rerun it as you debug - it works pretty well. | |
| 163 | |
| 164 Once the script works on some test data, you can generate a toolshed compatible | |
| 165 gzip file containing your script ready to run as an ordinary Galaxy tool in | |
| 166 a repository on your local toolshed. That means safe and largely automated | |
| 167 installation in any production Galaxy configured to use your toolshed. | |
| 168 | |
| 169 *Generated tool Security* Once you install a generated tool, it's just | |
| 170 another tool - assuming the script is safe. They just run normally and their | |
| 171 user cannot do anything unusually insecure but please, practice safe toolshed. | |
| 172 Read the fucking code before you install any tool. Especially this one - | |
| 173 it is really scary. | |
| 174 | |
| 175 If you opt for an HTML output, you get all the script outputs arranged | |
| 176 as a single Html history item - all output files are linked, thumbnails for | |
| 177 all the pdfs. Ugly but really inexpensive. | |
| 178 | |
| 179 Patches and suggestions welcome as bitbucket issues please? | |
| 180 | |
| 181 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012 | |
| 182 | |
| 183 all rights reserved | |
| 184 Licensed under the LGPL if you want to improve it, feel free | |
| 185 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home | |
| 186 | |
| 187 Material for our more enthusiastic and voracious readers continues below - | |
| 188 we salute you. | |
| 189 | |
| 190 **Motivation** Simple transformation, filtering or reporting scripts get | |
| 191 written, run and lost every day in most busy labs - even ours where Galaxy is | |
| 192 in use. This 'dark script matter' is pervasive and generally not reproducible. | |
| 193 | |
| 194 **Benefits** For our group, this allows Galaxy to fill that important dark | |
| 195 script gap - all those "small" bioinformatics tasks. Once a user has a working | |
| 196 R (or python or perl) script that does something Galaxy cannot currently do | |
| 197 (eg transpose a tabular file) and takes parameters the way Galaxy supplies | |
| 198 them (see example below), they: | |
| 199 | |
| 200 1. Install the tool factory on a personal private instance | |
| 201 | |
| 202 2. Upload a small test data set | |
| 203 | |
| 204 3. Paste the script into the 'script' text box and iteratively run the | |
| 205 insecure tool on test data until it works right - there is absolutely no | |
| 206 reason to do this anywhere other than on a personal private instance. | |
| 207 | |
| 208 4. Once it works right, set the 'Generate toolshed gzip' option and run | |
| 209 it again. | |
| 210 | |
| 211 5. A toolshed style gzip appears ready to upload and install like any other | |
| 212 Toolshed entry. | |
| 213 | |
| 214 6. Upload the new tool to the toolshed | |
| 215 | |
| 216 7. Ask the local admin to check the new tool to confirm it's not evil and | |
| 217 install it in the local production galaxy | |
| 218 | |
| 219 **Simple examples on the tool form** | |
| 220 | |
| 221 A simple Rscript "filter" showing how the command line parameters can be | |
| 222 handled, takes an input file, does something (transpose in this case) and | |
| 223 writes the results to a new tabular file:: | |
| 224 | |
| 225 # transpose a tabular input file and write as a tabular output file | |
| 226 ourargs = commandArgs(TRUE) | |
| 227 inf = ourargs[1] | |
| 228 outf = ourargs[2] | |
| 229 inp = read.table(inf,head=F,row.names=NULL,sep='\t') | |
| 230 outp = t(inp) | |
| 231 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F) | |
| 232 | |
| 233 Calculate a multiple test adjusted p value from a column of p values - | |
| 234 for this script to be useful, it needs the right column for the input to be | |
| 235 specified in the code for the given input file type(s) specified when the | |
| 236 tool is generated :: | |
| 237 | |
| 238 # use p.adjust - assumes a HEADER row and column 1 - please fix for any | |
| 239 real use | |
| 240 column = 1 # adjust if necessary for some other kind of input | |
| 241 fdrmeth = 'BH' | |
| 242 ourargs = commandArgs(TRUE) | |
| 243 inf = ourargs[1] | |
| 244 outf = ourargs[2] | |
| 245 inp = read.table(inf,head=T,row.names=NULL,sep='\t') | |
| 246 p = inp[,column] | |
| 247 q = p.adjust(p,method=fdrmeth) | |
| 248 newval = paste(fdrmeth,'p-value',sep='_') | |
| 249 q = data.frame(q) | |
| 250 names(q) = newval | |
| 251 outp = cbind(inp,newval=q) | |
| 252 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T) | |
| 253 | |
| 254 | |
| 255 | |
| 256 Another Rscript example without any input file - generates a random heatmap | |
| 257 pdf - you must make sure the option to create an HTML output file is | |
| 258 turned on for this to work. The heatmap will be presented as a thumbnail | |
| 259 linked to the pdf in the resulting HTML page:: | |
| 260 | |
| 261 # note this script takes NO input or output because it generates random data | |
| 262 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100), | |
| 263 e=runif(100),f=runif(100)) | |
| 264 bar = as.matrix(foo) | |
| 265 pdf( "heattest.pdf" ) | |
| 266 heatmap(bar,main='Random Heatmap') | |
| 267 dev.off() | |
| 268 | |
| 269 A Python example that reverses each row of a tabular file. You'll need | |
| 270 to remove the leading spaces for this to work if cut and pasted into the | |
| 271 script box. Note that you can already do this in Galaxy by setting up the | |
| 272 cut columns tool with the correct number of columns in reverse order,but | |
| 273 this script will work for any number of columns so is completely generic:: | |
| 274 | |
| 275 # reverse order of columns in a tabular file | |
| 276 import sys | |
| 277 inp = sys.argv[1] | |
| 278 outp = sys.argv[2] | |
| 279 i = open(inp,'r') | |
| 280 o = open(outp,'w') | |
| 281 for row in i: | |
| 282 rs = row.rstrip().split('\t') | |
| 283 rs.reverse() | |
| 284 o.write('\t'.join(rs)) | |
| 285 o.write('\n') | |
| 286 i.close() | |
| 287 o.close() | |
| 288 | |
| 289 | |
| 290 Galaxy as an IDE for developing API scripts | |
| 291 If you need to develop Galaxy API scripts and you like to live dangerously, | |
| 292 please read on. | |
| 293 | |
| 294 Galaxy as an IDE? | |
| 295 Amazingly enough, blend-lib API scripts run perfectly well *inside* | |
| 296 Galaxy when pasted into a Tool Factory form. No need to generate a new | |
| 297 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, | |
| 298 it is actually quite useable. | |
| 299 | |
| 300 Why bother - what's wrong with Eclipse | |
| 301 Nothing. But, compared with developing API scripts in the usual way outside | |
| 302 Galaxy, you get persistence and other framework benefits plus at absolutely | |
| 303 no extra charge, a ginormous security problem if you share the history or | |
| 304 any outputs because they contain the api script with key so development | |
| 305 servers only please! | |
| 306 | |
| 307 Workflow | |
| 308 Fire up the Tool Factory in Galaxy. | |
| 309 | |
| 310 Leave the input box empty, set the interpreter to python, paste and run an | |
| 311 api script - eg working example (substitute the url and key) below. | |
| 312 | |
| 313 It took me a few iterations to develop the example below because I know | |
| 314 almost nothing about the API. I started with very simple code from one of the | |
| 315 samples and after each run, the (edited..) api script is conveniently recreated | |
| 316 using the redo button on the history output item. So each successive version | |
| 317 of the developing api script you run is persisted - ready to be edited and | |
| 318 rerun easily. It is ''very'' handy to be able to add a line of code to the | |
| 319 script and run it, then view the output to (eg) inspect dicts returned by | |
| 320 API calls to help move progressively deeper iteratively. | |
| 321 | |
| 322 Give the below a whirl on a private clone (install the tool factory from | |
| 323 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles. | |
| 324 | |
| 325 Eg tool factory api script | |
| 326 import sys | |
| 327 from blend.galaxy import GalaxyInstance | |
| 328 ourGal = 'http://x.x.x.x:xxxx' | |
| 329 ourKey = 'xxx' | |
| 330 gi = GalaxyInstance(ourGal, key=ourKey) | |
| 331 libs = gi.libraries.get_libraries() | |
| 332 res = [] | |
| 333 # libs looks like | |
| 334 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': | |
| 335 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data', | |
| 336 for lib in libs: | |
| 337 res.append('%s:\n' % lib['name']) | |
| 338 res.append(str(gi.libraries.show_library(lib['id'],contents=True))) | |
| 339 outf=open(sys.argv[2],'w') | |
| 340 outf.write('\n'.join(res)) | |
| 341 outf.close() | |
| 342 | |
| 343 **Attribution** | |
| 344 Creating re-usable tools from scripts: The Galaxy Tool Factory | |
| 345 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team | |
| 346 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573 | |
| 347 | |
| 348 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref | |
| 349 | |
| 350 **Licensing** | |
| 351 Copyright Ross Lazarus 2010 | |
| 352 ross lazarus at g mail period com | |
| 353 | |
| 354 All rights reserved. | |
| 355 | |
| 356 Licensed under the LGPL | |
| 357 | |
| 358 **Obligatory screenshot** | |
| 359 | |
| 360 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png | |
| 361 |
