| 3 | 1 Galaxy wrappers for common unix text-processing tools | 
|  | 2 ===================================================== | 
| 0 | 3 | 
|  | 4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) | 
| 4 | 5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and | 
|  | 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose | 
|  | 7 text manipulation tool to this repository. | 
| 0 | 8 | 
|  | 9 | 
| 3 | 10 Tools: | 
| 0 | 11 | 
|  | 12 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) | 
|  | 13 * sed - Stream Editor ( http://sed.sf.net ) | 
|  | 14 * grep - Search files ( http://www.gnu.org/software/grep/ ) | 
|  | 15 * sort_columns - Sorting every line according to there columns | 
|  | 16 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): | 
| 3 | 17 | 
| 0 | 18   * sort - sort files | 
|  | 19   * join - join two files, based on common key field. | 
|  | 20   * cut  - keep/discard fields from a file | 
|  | 21   * unsorted_uniq - keep unique/duplicated lines in a file | 
|  | 22   * sorted_uniq - keep unique/duplicated lines in a file | 
|  | 23   * head - keep the first X lines in a file. | 
|  | 24   * tail - keep the last X lines in a file. | 
|  | 25 | 
|  | 26 Few improvements over the standard tools: | 
|  | 27 | 
|  | 28   * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) | 
|  | 29   * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) | 
|  | 30   * Find_and_Replace - Find/Replace text in a line or specific column. | 
|  | 31   * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. | 
|  | 32   * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) | 
|  | 33 | 
|  | 34 | 
|  | 35 Requirements | 
|  | 36 ------------ | 
|  | 37 | 
| 4 | 38 1. Coreutils vesion 8.22 or later. | 
| 0 | 39 2. AWK version 4.0.1 or later. | 
|  | 40 3. SED version 4.2 *with* a special patch | 
|  | 41 4. Grep with PCRE support | 
|  | 42 | 
| 3 | 43 These will be installed automatically with the Galaxy `Tool Shed`_. | 
| 0 | 44 | 
|  | 45 | 
|  | 46 ------------------- | 
|  | 47 NOTE About Security | 
|  | 48 ------------------- | 
|  | 49 | 
|  | 50 The included tools are secure (barring unintentional bugs): | 
|  | 51 The main concern might be executing system commands with awk's "system" and sed's "e" commands, | 
|  | 52 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. | 
|  | 53 These commands are DISABLED using the "--sandbox" parameter to awk and sed. | 
|  | 54 | 
|  | 55 User trying to run an awk program similar to: | 
| 3 | 56 | 
| 0 | 57  BEGIN { system("ls") } | 
| 3 | 58 | 
| 0 | 59 Will get an error (in Galaxy) saying: | 
| 3 | 60 | 
| 0 | 61  fatal: 'system' function not allowed in sandbox mode. | 
|  | 62 | 
|  | 63 User trying to run a SED program similar to: | 
| 3 | 64 | 
| 0 | 65  1els | 
| 3 | 66 | 
| 0 | 67 will get an error (in Galaxy) saying: | 
| 3 | 68 | 
| 0 | 69  sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode | 
|  | 70 | 
| 3 | 71 | 
| 0 | 72 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. | 
|  | 73 | 
|  | 74 ------------ | 
|  | 75 Installation | 
|  | 76 ------------ | 
|  | 77 | 
| 3 | 78 Should be done via the Galaxy `Tool Shed`_. | 
| 0 | 79 | 
|  | 80 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed | 
|  | 81 | 
|  | 82 | 
|  | 83 ---- | 
|  | 84 TODO | 
|  | 85 ---- | 
|  | 86 | 
| 4 | 87 - add shuf | 
| 0 | 88   we can remove the random feature from sort and use shuf instead | 
|  | 89 - move some advanced settings under a conditional, for example the cut tools offers to cut bytes | 
| 1 | 90 - cut wrapper has some output conditional magic for interval files, that needs to be checked | 
|  | 91 - comm wrapper, see the Galaxy default one | 
| 2 | 92 - evaluate the join wrappers against the Galaxy ones, maybe we should drop them | 
| 0 | 93 | 
|  | 94 | 
| 3 | 95 ------- | 
|  | 96 License | 
|  | 97 ------- | 
|  | 98 | 
|  | 99 * Copyright (c) 2009-2013   A. Gordon  (gordon <at> cshl dot edu) | 
| 4 | 100 * Copyright (c) 2013-2015   B. Gruening  (bjoern dot gruening <at> gmail dot com) | 
| 0 | 101 | 
|  | 102 | 
| 3 | 103 Permission is hereby granted, free of charge, to any person obtaining | 
|  | 104 a copy of this software and associated documentation files (the | 
|  | 105 "Software"), to deal in the Software without restriction, including | 
|  | 106 without limitation the rights to use, copy, modify, merge, publish, | 
|  | 107 distribute, sublicense, and/or sell copies of the Software, and to | 
|  | 108 permit persons to whom the Software is furnished to do so, subject to | 
|  | 109 the following conditions: | 
| 0 | 110 | 
| 3 | 111 The above copyright notice and this permission notice shall be | 
|  | 112 included in all copies or substantial portions of the Software. | 
|  | 113 | 
|  | 114 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, | 
|  | 115 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | 
|  | 116 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. | 
|  | 117 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY | 
|  | 118 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, | 
|  | 119 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE | 
|  | 120 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | 
|  | 121 |