text_processing: awk.xml comparison

comparison awk.xml @ 6:8928e6d1e7ba draft

Uploaded

author	bgruening
date	Thu, 08 Jan 2015 09:07:31 -0500
parents	56e80527c482
children	d64eace4f9f3

comparison

equal deleted inserted replaced

-:3f0e0d4c15a9
+:8928e6d1e7ba
 <version_command>awk --version | head -n 1</version_command>
 <command>
 <![CDATA[
 awk
 --sandbox
--v FS=\$'\t'
+-v FS='	'
--v OFS=\$'\t'
+-v OFS='	'
 --re-interval
--f '$awk_script'
+-f "$awk_script"
-"$input"
+"$infile"
-> "$output"
+> "$outfile"
 ]]>
 </command>
 <inputs>
-<param format="txt" name="input" type="data" label="File to process" />
+<param name="infile" format="txt" type="data" label="File to process" />
-<param name="url_paste" type="text" area="true" size="5x35" label="AWK Program" help="">
+<param name="code" type="text" area="true" size="5x35" label="AWK Program" help="">
 <sanitizer>
 <valid initial="string.printable">
 <remove value="&apos;"/>
 </valid>
 </sanitizer>
 </param>
 </inputs>
 <configfiles>
-<configfile name="awk_script">
+<configfile name="awk_script">$code</configfile>
-$url_paste
-</configfile>
 </configfiles>
 <outputs>
-<data format="input" name="output" metadata_source="input"/>
+<data name="outfile" format_source="infile" metadata_source="infile"/>
 </outputs>
 <tests>
 <test>
-<param name="input" value="unix_awk_input1.txt" />
+<param name="infile" value="awk1.txt" />
-<param name="awk_script" value="$2>0.5 { print $2*9, $1 }" />
+<!-- commas are not allowed in a value field. Values with comma will be splitted -->
-<output name="output" file="unix_awk_output1.txt" />
+<param name="code" value='$2>0.5 { print $2*9"\t"$1 }' />
-</test>
+<output name="outfile" file="awk_results1.txt" />
+</test>
 </tests>
 <help>
 <![CDATA[
 **What it does**
 This tool runs the unix **awk** command on the selected data file.
 The basic form of AWK program is::
 pattern { action 1; action 2; action 3; }
 **Pattern Examples**
 - **$2 == "chr3"**  will match lines whose second column is the string 'chr3'
 - **$5-$4>23**  will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
 - **/AG..AG/** will match lines that contain the regular expression **AG..AG** (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
 - **$7 ~ /A{4}U/**  will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
-- **10000 &lt; $4 &amp;&amp; $4 &lt; 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
+- **10000 < $4 && $4 < 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
 - If no pattern is specified, all lines match (meaning the **action** part will be executed on all lines).
 **Action Examples**
 - **{ print }** or **{ print $0 }**   will print the entire input line (the line that matched in **pattern**). **$0** is a special marker meaning 'the entire line'.
 - **{ print $1, $4, $5 }** will print only the first, fourth and fifth fields of the input line.
 - **{ print $4, $5-$4 }** will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
 - If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
 **AWK's Regular Expression Syntax**
 The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text.

Mercurial > repos > bgruening > text_processing

comparison awk.xml @ 6:8928e6d1e7ba draft