comparison replace_text_in_column.xml @ 6:8928e6d1e7ba draft

Uploaded
author bgruening
date Thu, 08 Jan 2015 09:07:31 -0500
parents 56e80527c482
children d64eace4f9f3
comparison
equal deleted inserted replaced
5:3f0e0d4c15a9 6:8928e6d1e7ba
5 </macros> 5 </macros>
6 <expand macro="requirements"> 6 <expand macro="requirements">
7 <requirement type="package" version="4.1.0">gnu_awk</requirement> 7 <requirement type="package" version="4.1.0">gnu_awk</requirement>
8 </expand> 8 </expand>
9 <version_command>awk --version | head -n 1</version_command> 9 <version_command>awk --version | head -n 1</version_command>
10 <command interpreter="sh"> 10 <command>
11 <![CDATA[ 11 <![CDATA[
12 ##adapt to awk's quirks - to pass an acutal backslash - two backslashes are required (just like in a C string)
13 REPLACE_PATTERN=\${$replace_pattern//\\/\\\\};
14 awk 12 awk
15 -v OFS="\t" 13 -v OFS=" "
16 --re-interval 14 --re-interval
17 --sandbox "{ \$$column = gensub( /$find_pattern/, \"$replace_pattern\", \"g\", \$$column ) ; print \$0 ; }" 15 --sandbox '{ \$$column = gensub( /$find_pattern/, "$replace_pattern", "g", \$$column ) ; print \$0 ; }'
18 "$infile" 16 "$infile"
19 > "$output" 17 > "$outfile"
20 ]]> 18 ]]>
21 </command> 19 </command>
22 <inputs> 20 <inputs>
23 <param format="tabular" name="infile" type="data" label="File to process" /> 21 <param format="tabular" name="infile" type="data" label="File to process" />
24 <param name="column" label="in column" type="data_column" data_ref="infile" accept_default="true" /> 22 <param name="column" label="in column" type="data_column" data_ref="infile" accept_default="true" />
37 </valid> 35 </valid>
38 </sanitizer> 36 </sanitizer>
39 </param> 37 </param>
40 </inputs> 38 </inputs>
41 <outputs> 39 <outputs>
42 <data format="input" name="output" metadata_source="infile" /> 40 <data name="outfile" format_source="infile" metadata_source="infile" />
43 </outputs> 41 </outputs>
44 <tests> 42 <tests>
45 <test> 43 <test>
46 <param name="infile" value="replace_text_in_column_in1.txt" ftype="tabular" /> 44 <param name="infile" value="replace_text_in_column1.txt" ftype="tabular" />
47 <param name="column" value="4" /> 45 <param name="column" value="4" />
48 <param name="find_pattern" value=".+_(R.)" /> 46 <param name="find_pattern" value=".+_(R.)" />
49 <param name="replace_pattern" value="\1" /> 47 <param name="replace_pattern" value="\\1" />
50 <output name="output" file="replace_text_in_column_output1.txt" /> 48 <output name="outfile" file="replace_text_in_column_results1.txt" />
51 </test> 49 </test>
52 </tests> 50 </tests>
53 <help> 51 <help>
54 <![CDATA[ 52 <![CDATA[
55 **What it does** 53 **What it does**
56 54
57 This tool performs find &amp; replace operation on a specified column in a given file. 55 This tool performs find & replace operation on a specified column in a given file.
58 56
59 .. class:: infomark 57 .. class:: infomark
60 58
61 The **pattern to find** uses the **extended regular** expression syntax (same as running 'awk --re-interval'). 59 The **pattern to find** uses the **extended regular** expression syntax (same as running 'awk --re-interval').
62 60
77 75
78 76
79 **Examples of Replace Patterns** 77 **Examples of Replace Patterns**
80 78
81 - **WORLD** The word 'WORLD' will be placed whereever the find pattern was found. 79 - **WORLD** The word 'WORLD' will be placed whereever the find pattern was found.
82 - **FOO-&amp;-BAR** Each time the find pattern is found, it will be surrounded with 'FOO-' at the begining and '-BAR' at the end. **&amp;** (ampersand) represents the matched find pattern. 80 - **FOO-&-BAR** Each time the find pattern is found, it will be surrounded with 'FOO-' at the begining and '-BAR' at the end. **&** (ampersand) represents the matched find pattern.
83 - **\\1** The text which matched the first parenthesis in the Find Pattern. 81 - **\\1** The text which matched the first parenthesis in the Find Pattern.
84 82
85 83
86 ----- 84 -----
87 85
95 ----- 93 -----
96 94
97 **Example 2** 95 **Example 2**
98 96
99 **Find Pattern:** ^(.{4}) 97 **Find Pattern:** ^(.{4})
100 **Replace Pattern:** &amp;\\t 98 **Replace Pattern:** &\\t
101 99
102 Find the first four characters in each line, and replace them with the same text, followed by a tab character. In practice - this will split the first line into two columns. This operation affects only the selected column. 100 Find the first four characters in each line, and replace them with the same text, followed by a tab character. In practice - this will split the first line into two columns. This operation affects only the selected column.
103 101
104 102
105 ----- 103 -----