comparison awk.xml @ 6:8928e6d1e7ba draft

Uploaded
author bgruening
date Thu, 08 Jan 2015 09:07:31 -0500
parents 56e80527c482
children d64eace4f9f3
comparison
equal deleted inserted replaced
5:3f0e0d4c15a9 6:8928e6d1e7ba
9 <version_command>awk --version | head -n 1</version_command> 9 <version_command>awk --version | head -n 1</version_command>
10 <command> 10 <command>
11 <![CDATA[ 11 <![CDATA[
12 awk 12 awk
13 --sandbox 13 --sandbox
14 -v FS=\$'\t' 14 -v FS=' '
15 -v OFS=\$'\t' 15 -v OFS=' '
16 --re-interval 16 --re-interval
17 -f '$awk_script' 17 -f "$awk_script"
18 "$input" 18 "$infile"
19 > "$output" 19 > "$outfile"
20 ]]> 20 ]]>
21 </command> 21 </command>
22 <inputs> 22 <inputs>
23 <param format="txt" name="input" type="data" label="File to process" /> 23 <param name="infile" format="txt" type="data" label="File to process" />
24 <param name="url_paste" type="text" area="true" size="5x35" label="AWK Program" help=""> 24 <param name="code" type="text" area="true" size="5x35" label="AWK Program" help="">
25 <sanitizer> 25 <sanitizer>
26 <valid initial="string.printable"> 26 <valid initial="string.printable">
27 <remove value="&apos;"/> 27 <remove value="&apos;"/>
28 </valid> 28 </valid>
29 </sanitizer> 29 </sanitizer>
30 </param> 30 </param>
31 </inputs> 31 </inputs>
32 <configfiles> 32 <configfiles>
33 <configfile name="awk_script"> 33 <configfile name="awk_script">$code</configfile>
34 $url_paste
35 </configfile>
36 </configfiles> 34 </configfiles>
37 <outputs> 35 <outputs>
38 <data format="input" name="output" metadata_source="input"/> 36 <data name="outfile" format_source="infile" metadata_source="infile"/>
39 </outputs> 37 </outputs>
40 <tests> 38 <tests>
41 <test> 39 <test>
42 <param name="input" value="unix_awk_input1.txt" /> 40 <param name="infile" value="awk1.txt" />
43 <param name="awk_script" value="$2>0.5 { print $2*9, $1 }" /> 41 <!-- commas are not allowed in a value field. Values with comma will be splitted -->
44 <output name="output" file="unix_awk_output1.txt" /> 42 <param name="code" value='$2>0.5 { print $2*9"\t"$1 }' />
45 </test> 43 <output name="outfile" file="awk_results1.txt" />
44 </test>
46 </tests> 45 </tests>
47
48 <help> 46 <help>
49 <![CDATA[ 47 <![CDATA[
50 **What it does** 48 **What it does**
51 49
52 This tool runs the unix **awk** command on the selected data file. 50 This tool runs the unix **awk** command on the selected data file.
76 The basic form of AWK program is:: 74 The basic form of AWK program is::
77 75
78 pattern { action 1; action 2; action 3; } 76 pattern { action 1; action 2; action 3; }
79 77
80 78
81
82 **Pattern Examples** 79 **Pattern Examples**
83 80
84 - **$2 == "chr3"** will match lines whose second column is the string 'chr3' 81 - **$2 == "chr3"** will match lines whose second column is the string 'chr3'
85 - **$5-$4>23** will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23. 82 - **$5-$4>23** will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
86 - **/AG..AG/** will match lines that contain the regular expression **AG..AG** (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.) 83 - **/AG..AG/** will match lines that contain the regular expression **AG..AG** (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
87 - **$7 ~ /A{4}U/** will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.) 84 - **$7 ~ /A{4}U/** will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
88 - **10000 &lt; $4 &amp;&amp; $4 &lt; 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000 85 - **10000 < $4 && $4 < 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
89 - If no pattern is specified, all lines match (meaning the **action** part will be executed on all lines). 86 - If no pattern is specified, all lines match (meaning the **action** part will be executed on all lines).
90
91 87
92 88
93 **Action Examples** 89 **Action Examples**
94 90
95 - **{ print }** or **{ print $0 }** will print the entire input line (the line that matched in **pattern**). **$0** is a special marker meaning 'the entire line'. 91 - **{ print }** or **{ print $0 }** will print the entire input line (the line that matched in **pattern**). **$0** is a special marker meaning 'the entire line'.
96 - **{ print $1, $4, $5 }** will print only the first, fourth and fifth fields of the input line. 92 - **{ print $1, $4, $5 }** will print only the first, fourth and fifth fields of the input line.
97 - **{ print $4, $5-$4 }** will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length). 93 - **{ print $4, $5-$4 }** will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
98 - If no action part is specified (not even the curly brackets) - the default action is to print the entire line. 94 - If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
99
100 95
101 96
102 **AWK's Regular Expression Syntax** 97 **AWK's Regular Expression Syntax**
103 98
104 The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text. 99 The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text.