Mercurial > repos > bgruening > text_processing
diff readme.rst @ 6:8928e6d1e7ba draft
Uploaded
author | bgruening |
---|---|
date | Thu, 08 Jan 2015 09:07:31 -0500 |
parents | 56e80527c482 |
children | d9819ccb9ca7 |
line wrap: on
line diff
--- a/readme.rst Wed Jan 07 11:15:41 2015 -0500 +++ b/readme.rst Thu Jan 08 09:07:31 2015 -0500 @@ -8,12 +8,13 @@ Tools: +------ -* awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) -* sed - Stream Editor ( http://sed.sf.net ) -* grep - Search files ( http://www.gnu.org/software/grep/ ) -* sort_columns - Sorting every line according to there columns -* GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): + * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) + * sed - Stream Editor ( http://sed.sf.net ) + * grep - Search files ( http://www.gnu.org/software/grep/ ) + * sort_columns - Sorting every line according to there columns + * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): * sort - sort files * join - join two files, based on common key field. @@ -22,8 +23,11 @@ * sorted_uniq - keep unique/duplicated lines in a file * head - keep the first X lines in a file. * tail - keep the last X lines in a file. + * unfold_column - unfold a column with multiple entities into multiple lines + Few improvements over the standard tools: +----------------------------------------- * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) @@ -32,15 +36,15 @@ * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) -Requirements ------------- +Requirements: +------------- -1. Coreutils vesion 8.22 or later. -2. AWK version 4.0.1 or later. -3. SED version 4.2 *with* a special patch -4. Grep with PCRE support + * Coreutils vesion 8.22 or later. + * AWK version 4.0.1 or later. + * SED version 4.2 *with* a special patch + * Grep with PCRE support -These will be installed automatically with the Galaxy `Tool Shed`_. +All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing ------------------- @@ -52,23 +56,22 @@ or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. These commands are DISABLED using the "--sandbox" parameter to awk and sed. -User trying to run an awk program similar to: +User trying to run an awk program similar to:: BEGIN { system("ls") } -Will get an error (in Galaxy) saying: +Will get an error (in Galaxy) saying:: fatal: 'system' function not allowed in sandbox mode. -User trying to run a SED program similar to: +User trying to run a SED program similar to:: 1els -will get an error (in Galaxy) saying: +will get an error (in Galaxy) saying:: sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode - That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. ------------ @@ -76,6 +79,7 @@ ------------ Should be done via the Galaxy `Tool Shed`_. +Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed @@ -84,20 +88,19 @@ TODO ---- -- add shuf - we can remove the random feature from sort and use shuf instead -- move some advanced settings under a conditional, for example the cut tools offers to cut bytes -- cut wrapper has some output conditional magic for interval files, that needs to be checked -- comm wrapper, see the Galaxy default one -- evaluate the join wrappers against the Galaxy ones, maybe we should drop them + * add shuf, we can remove the random feature from sort and use shuf instead + * move some advanced settings under a conditional, for example the cut tools offers to cut bytes + * cut wrapper has some output conditional magic for interval files, that needs to be checked + * comm wrapper, see the Galaxy default one + * evaluate the join wrappers against the Galaxy ones, maybe we should drop them ------- License ------- -* Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) -* Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) + * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) + * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) Permission is hereby granted, free of charge, to any person obtaining