Mercurial > repos > bgruening > text_processing
comparison readme.rst @ 6:8928e6d1e7ba draft
Uploaded
| author | bgruening |
|---|---|
| date | Thu, 08 Jan 2015 09:07:31 -0500 |
| parents | 56e80527c482 |
| children | d9819ccb9ca7 |
comparison
equal
deleted
inserted
replaced
| 5:3f0e0d4c15a9 | 6:8928e6d1e7ba |
|---|---|
| 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose | 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose |
| 7 text manipulation tool to this repository. | 7 text manipulation tool to this repository. |
| 8 | 8 |
| 9 | 9 |
| 10 Tools: | 10 Tools: |
| 11 ------ | |
| 11 | 12 |
| 12 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) | 13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) |
| 13 * sed - Stream Editor ( http://sed.sf.net ) | 14 * sed - Stream Editor ( http://sed.sf.net ) |
| 14 * grep - Search files ( http://www.gnu.org/software/grep/ ) | 15 * grep - Search files ( http://www.gnu.org/software/grep/ ) |
| 15 * sort_columns - Sorting every line according to there columns | 16 * sort_columns - Sorting every line according to there columns |
| 16 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): | 17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): |
| 17 | 18 |
| 18 * sort - sort files | 19 * sort - sort files |
| 19 * join - join two files, based on common key field. | 20 * join - join two files, based on common key field. |
| 20 * cut - keep/discard fields from a file | 21 * cut - keep/discard fields from a file |
| 21 * unsorted_uniq - keep unique/duplicated lines in a file | 22 * unsorted_uniq - keep unique/duplicated lines in a file |
| 22 * sorted_uniq - keep unique/duplicated lines in a file | 23 * sorted_uniq - keep unique/duplicated lines in a file |
| 23 * head - keep the first X lines in a file. | 24 * head - keep the first X lines in a file. |
| 24 * tail - keep the last X lines in a file. | 25 * tail - keep the last X lines in a file. |
| 26 * unfold_column - unfold a column with multiple entities into multiple lines | |
| 27 | |
| 25 | 28 |
| 26 Few improvements over the standard tools: | 29 Few improvements over the standard tools: |
| 30 ----------------------------------------- | |
| 27 | 31 |
| 28 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) | 32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) |
| 29 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) | 33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) |
| 30 * Find_and_Replace - Find/Replace text in a line or specific column. | 34 * Find_and_Replace - Find/Replace text in a line or specific column. |
| 31 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. | 35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. |
| 32 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) | 36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) |
| 33 | 37 |
| 34 | 38 |
| 35 Requirements | 39 Requirements: |
| 36 ------------ | 40 ------------- |
| 37 | 41 |
| 38 1. Coreutils vesion 8.22 or later. | 42 * Coreutils vesion 8.22 or later. |
| 39 2. AWK version 4.0.1 or later. | 43 * AWK version 4.0.1 or later. |
| 40 3. SED version 4.2 *with* a special patch | 44 * SED version 4.2 *with* a special patch |
| 41 4. Grep with PCRE support | 45 * Grep with PCRE support |
| 42 | 46 |
| 43 These will be installed automatically with the Galaxy `Tool Shed`_. | 47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing |
| 44 | 48 |
| 45 | 49 |
| 46 ------------------- | 50 ------------------- |
| 47 NOTE About Security | 51 NOTE About Security |
| 48 ------------------- | 52 ------------------- |
| 50 The included tools are secure (barring unintentional bugs): | 54 The included tools are secure (barring unintentional bugs): |
| 51 The main concern might be executing system commands with awk's "system" and sed's "e" commands, | 55 The main concern might be executing system commands with awk's "system" and sed's "e" commands, |
| 52 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. | 56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. |
| 53 These commands are DISABLED using the "--sandbox" parameter to awk and sed. | 57 These commands are DISABLED using the "--sandbox" parameter to awk and sed. |
| 54 | 58 |
| 55 User trying to run an awk program similar to: | 59 User trying to run an awk program similar to:: |
| 56 | 60 |
| 57 BEGIN { system("ls") } | 61 BEGIN { system("ls") } |
| 58 | 62 |
| 59 Will get an error (in Galaxy) saying: | 63 Will get an error (in Galaxy) saying:: |
| 60 | 64 |
| 61 fatal: 'system' function not allowed in sandbox mode. | 65 fatal: 'system' function not allowed in sandbox mode. |
| 62 | 66 |
| 63 User trying to run a SED program similar to: | 67 User trying to run a SED program similar to:: |
| 64 | 68 |
| 65 1els | 69 1els |
| 66 | 70 |
| 67 will get an error (in Galaxy) saying: | 71 will get an error (in Galaxy) saying:: |
| 68 | 72 |
| 69 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode | 73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode |
| 70 | |
| 71 | 74 |
| 72 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. | 75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. |
| 73 | 76 |
| 74 ------------ | 77 ------------ |
| 75 Installation | 78 Installation |
| 76 ------------ | 79 ------------ |
| 77 | 80 |
| 78 Should be done via the Galaxy `Tool Shed`_. | 81 Should be done via the Galaxy `Tool Shed`_. |
| 82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing | |
| 79 | 83 |
| 80 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed | 84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed |
| 81 | 85 |
| 82 | 86 |
| 83 ---- | 87 ---- |
| 84 TODO | 88 TODO |
| 85 ---- | 89 ---- |
| 86 | 90 |
| 87 - add shuf | 91 * add shuf, we can remove the random feature from sort and use shuf instead |
| 88 we can remove the random feature from sort and use shuf instead | 92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes |
| 89 - move some advanced settings under a conditional, for example the cut tools offers to cut bytes | 93 * cut wrapper has some output conditional magic for interval files, that needs to be checked |
| 90 - cut wrapper has some output conditional magic for interval files, that needs to be checked | 94 * comm wrapper, see the Galaxy default one |
| 91 - comm wrapper, see the Galaxy default one | 95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them |
| 92 - evaluate the join wrappers against the Galaxy ones, maybe we should drop them | |
| 93 | 96 |
| 94 | 97 |
| 95 ------- | 98 ------- |
| 96 License | 99 License |
| 97 ------- | 100 ------- |
| 98 | 101 |
| 99 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) | 102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) |
| 100 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) | 103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) |
| 101 | 104 |
| 102 | 105 |
| 103 Permission is hereby granted, free of charge, to any person obtaining | 106 Permission is hereby granted, free of charge, to any person obtaining |
| 104 a copy of this software and associated documentation files (the | 107 a copy of this software and associated documentation files (the |
| 105 "Software"), to deal in the Software without restriction, including | 108 "Software"), to deal in the Software without restriction, including |
