comparison readme.rst @ 6:8928e6d1e7ba draft

Uploaded
author bgruening
date Thu, 08 Jan 2015 09:07:31 -0500
parents 56e80527c482
children d9819ccb9ca7
comparison
equal deleted inserted replaced
5:3f0e0d4c15a9 6:8928e6d1e7ba
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose
7 text manipulation tool to this repository. 7 text manipulation tool to this repository.
8 8
9 9
10 Tools: 10 Tools:
11 ------
11 12
12 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) 13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
13 * sed - Stream Editor ( http://sed.sf.net ) 14 * sed - Stream Editor ( http://sed.sf.net )
14 * grep - Search files ( http://www.gnu.org/software/grep/ ) 15 * grep - Search files ( http://www.gnu.org/software/grep/ )
15 * sort_columns - Sorting every line according to there columns 16 * sort_columns - Sorting every line according to there columns
16 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): 17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
17 18
18 * sort - sort files 19 * sort - sort files
19 * join - join two files, based on common key field. 20 * join - join two files, based on common key field.
20 * cut - keep/discard fields from a file 21 * cut - keep/discard fields from a file
21 * unsorted_uniq - keep unique/duplicated lines in a file 22 * unsorted_uniq - keep unique/duplicated lines in a file
22 * sorted_uniq - keep unique/duplicated lines in a file 23 * sorted_uniq - keep unique/duplicated lines in a file
23 * head - keep the first X lines in a file. 24 * head - keep the first X lines in a file.
24 * tail - keep the last X lines in a file. 25 * tail - keep the last X lines in a file.
26 * unfold_column - unfold a column with multiple entities into multiple lines
27
25 28
26 Few improvements over the standard tools: 29 Few improvements over the standard tools:
30 -----------------------------------------
27 31
28 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) 32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
29 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) 33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
30 * Find_and_Replace - Find/Replace text in a line or specific column. 34 * Find_and_Replace - Find/Replace text in a line or specific column.
31 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. 35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
32 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) 36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
33 37
34 38
35 Requirements 39 Requirements:
36 ------------ 40 -------------
37 41
38 1. Coreutils vesion 8.22 or later. 42 * Coreutils vesion 8.22 or later.
39 2. AWK version 4.0.1 or later. 43 * AWK version 4.0.1 or later.
40 3. SED version 4.2 *with* a special patch 44 * SED version 4.2 *with* a special patch
41 4. Grep with PCRE support 45 * Grep with PCRE support
42 46
43 These will be installed automatically with the Galaxy `Tool Shed`_. 47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
44 48
45 49
46 ------------------- 50 -------------------
47 NOTE About Security 51 NOTE About Security
48 ------------------- 52 -------------------
50 The included tools are secure (barring unintentional bugs): 54 The included tools are secure (barring unintentional bugs):
51 The main concern might be executing system commands with awk's "system" and sed's "e" commands, 55 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
52 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. 56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
53 These commands are DISABLED using the "--sandbox" parameter to awk and sed. 57 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
54 58
55 User trying to run an awk program similar to: 59 User trying to run an awk program similar to::
56 60
57 BEGIN { system("ls") } 61 BEGIN { system("ls") }
58 62
59 Will get an error (in Galaxy) saying: 63 Will get an error (in Galaxy) saying::
60 64
61 fatal: 'system' function not allowed in sandbox mode. 65 fatal: 'system' function not allowed in sandbox mode.
62 66
63 User trying to run a SED program similar to: 67 User trying to run a SED program similar to::
64 68
65 1els 69 1els
66 70
67 will get an error (in Galaxy) saying: 71 will get an error (in Galaxy) saying::
68 72
69 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode 73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
70
71 74
72 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. 75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
73 76
74 ------------ 77 ------------
75 Installation 78 Installation
76 ------------ 79 ------------
77 80
78 Should be done via the Galaxy `Tool Shed`_. 81 Should be done via the Galaxy `Tool Shed`_.
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
79 83
80 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed 84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
81 85
82 86
83 ---- 87 ----
84 TODO 88 TODO
85 ---- 89 ----
86 90
87 - add shuf 91 * add shuf, we can remove the random feature from sort and use shuf instead
88 we can remove the random feature from sort and use shuf instead 92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes
89 - move some advanced settings under a conditional, for example the cut tools offers to cut bytes 93 * cut wrapper has some output conditional magic for interval files, that needs to be checked
90 - cut wrapper has some output conditional magic for interval files, that needs to be checked 94 * comm wrapper, see the Galaxy default one
91 - comm wrapper, see the Galaxy default one 95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them
92 - evaluate the join wrappers against the Galaxy ones, maybe we should drop them
93 96
94 97
95 ------- 98 -------
96 License 99 License
97 ------- 100 -------
98 101
99 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) 102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
100 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) 103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)
101 104
102 105
103 Permission is hereby granted, free of charge, to any person obtaining 106 Permission is hereby granted, free of charge, to any person obtaining
104 a copy of this software and associated documentation files (the 107 a copy of this software and associated documentation files (the
105 "Software"), to deal in the Software without restriction, including 108 "Software"), to deal in the Software without restriction, including