Mercurial > repos > abossers > mummer_toolsuite

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/README_mummer	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,56 @@
+# Created/shared May 2011
+#
+# Alex Bossers
+# Central Veterinary Institute
+# Wageningen University and Research centre
+# Lelystad, The Netherlands
+#
+# Comments/improvements/bugs: Alex (dot) Bossers (at) wur (dot) nl
+
+
+# WHAT IT DOES
+The MUMmer suite is a set of very basic wrappers for the MUMmer genome comparison tools. Most common operations should be possible
+by using these wrappers. MUMmer works fast on smaller (bacterial) genomes but can also cope with eukaryotic genomes.
+
+In addition to the original MUMmer tools it also contains an additional conversion script to convert MUMmer comparison files,
+the so-called coords files into a readible format for Artemis Comparison Tool (ACT; Sanger UK).
+
+
+# REQUIREMENTS
+- Perl
+- Galaxy :)
+- MUMmer newer than version 3.20;
+      even though older versions might work as well.
+      Get your MUMmer here: http://mummer.sourceforge.net/
+      Make sure MUMmer is in your PATH and/or update the tool xml configs and wrappers for the full MUMmer path
+      if it is different from /opt/MUMmer/MUMmer.
+- ACT can be run locally or via Webstart if you want to visualise genome comparisons in detail: http://www.sanger.ac.uk/resources/software/act
+- GNUplot is a requirement for the MUMmerplot part (see MUMmer installation documentation)
+
+
+# SETUP
+Just unpack the tool xml and perl script somewhere appropriate and adapt the MUMmer installation part if different from above. Plug the tool in the tool_config.xml
+of your galaxy instance and refresh the tools or restart the galaxy server.
+
+
+# TESTING
+You can test the code by running Nucmer on the test data and visualise the results in MUMmerplot.
+It should return a MUMmerplot identical to the image provided. For reference I also included the corresponding log file.
+
+
+# LICENSE
+Copyright (c) 2011 Central Veterinary Institute of Wageningen UR, Lelystad, The Netherlands.
+MUMmer is copyright by its respective owner. See their licensing details.
+
+Our wrappers/programs are free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3 of the License, or
+(at your option) any later version.
+
+When distributing the tools please include this original reference.
+
+Use this tool at your own risk. Even though we tried to build tools and wrappers that free of errors,
+check your output since it might be erroneous. We will not be relyable to any failure this may have caused.
+
+If you like these scripts, please acknowledge our work.
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_clustering.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,238 @@
+<tool id="mummer_clustering" name="MUMmer Clustering" version="0.9.alx" force_history_refresh="True">
+  <description>: order sequence matches in clusters</description>
+  <command>
+	<!-- update this path to the installed location -->
+		$tool.cmd
+		#if $tool.cmd=="gaps":
+			$in_reference
+			#if $tool.gaps_r=="yes":
+				-r
+			#end if
+		#end if
+		#if $tool.cmd=="mgaps":
+			#if $tool.cmd_C=="yes":
+				-C
+			#end if
+			-d $tool.cmd_d
+			#if $tool.cmd_e=="yes":
+				-e
+			#end if
+			-f $tool.cmd_f
+			-l $tool.cmd_l
+			-s $tool.cmd_s
+		#end if
+		&lt; $tool.in_match_list
+		&gt; $out_tool
+
+  </command>
+	<inputs>
+	  <conditional name="tool">
+		<param name="cmd" type="select" label="MUMmer maximal matching" help="Algorithms are run with default parameters (none). For specific args see help below" >
+			<option value="gaps" selected="true">gaps</option>
+			<option value="mgaps">mgaps</option>
+		</param>
+		<when value="gaps">
+			<param name="in_reference" type="data" format="fasta" label="Reference FastA file" />
+			<param name="gaps_r" type="select" label="Use reversed [-r]" >
+				<option value="no" selected="true">No</option>
+				<option value="yes">Yes</option>
+			</param>
+			<param name="in_match_list" type="data" format="text" label="MUMmer match list" help="See help for more details" />
+		</when>
+		<when value="mgaps">
+			<param name="in_match_list" type="data" format="text" label="MUMmer match list" help="See help for more details" />
+			<param name="cmd_C" type="select" label="Check input header labels have reversed keyword [-C]" >
+				<option value="no" selected="true">No</option>
+				<option value="yes">Yes</option>
+			</param>
+			<param name="cmd_d" type="integer" size="5" value="5" label="Max fixed diagonal difference [-d]" />
+			<param name="cmd_e" type="select" label="Use extent of cluster [-e]" >
+				<option value="no" selected="true">No</option>
+				<option value="yes">Yes</option>
+			</param>
+			<param name="cmd_f" type="float" size="5" value="0.05" label="Max fraction separation for diagonal difference [-f]" />
+			<param name="cmd_l" type="integer" size="5" value="200" label="Min cluster length [-l]" />
+			<param name="cmd_s" type="integer" size="5" value="1000" label="Max separation adjecent matches in cluster [-s]" />
+		</when>
+	  </conditional>
+	</inputs>
+	<outputs>
+		<data name="out_tool" format="text" label="Clustering output" />
+	</outputs>
+	<requirements>
+<!--         <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> -->
+        <requirement type="package" version="4.6.4">gnuplot</requirement>
+        <requirement type="package" version="3.23">MUMmer</requirement>
+	</requirements>
+	<tests>
+		<test>
+		</test>
+	</tests>
+	<help>
+|
+
+
+**Reference**
+=============
+
+- **MUMmer clustering Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer suite v3.22:** http://mummer.sourceforge.net
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmer Clustering**
+=====================
+
+MUMmer's clustering algorithms attempt to order small individual matches into larger match clusters
+in order to make the output of mummer more intelligible. A dot plot makes it easy to spot alignment
+regions from a match list, however when examining the data without graphic aids, it is very difficult
+to draw any reasonable conclusions from the simple flat file list of matches. Clustering the matches
+together into larger groups of neighboring matches makes this process much easier by ordering the
+data and removing spurious matches.
+
+
+Gaps
+----
+
+*gaps* is the primary clustering algorithm for run-mummer1, and although classified as a "clustering"
+step, gaps is more of a sorting routine. It implements the LIS (longest increasing subset) algorithm
+to extract the longest consistent set of matches between two sequences, and generates a single
+cluster that represents the best "straight-line" arrangement of matches between the sequences. By
+straight-line, we mean no rearrangements or inversions, just a simple path of agreeing matches
+between the two sequences. This limits the usability of this program to the alignment of genomes
+that are very similar and with no large scale mutations. *gaps* is best suited for the comparison of
+near identical sequences with the goal of finding minor mutations like SNPs and small indels.
+
+Input can be filtered mummer output. The strange syntax is a result of a legacy issue described in
+the Known problems (manual) section, and requires the header be stripped from the mummer output. In
+addition, gaps is only designed to handle a single reference and a single query sequence, thus the
+preceding mummer run must also follow this constraint. The -r is optional and designates the incoming
+matches as reverse complement matches which must reference the reverse complement of the sequence,
+therefore forcing mummer to be run without the -c option.
+
+Reference: http://mummer.sourceforge.net/manual/#gaps
+
+**Output:**
+::
+
+ > /home/aphillip/data/GHP.1con  Consistent matches
+      183       17     22    none      -      -
+      238       72    108    none     33     33
+      347      181     92    none      1      1
+      458      292     50    none     19     19
+      705      539     44    none      1      1
+      750      584     38    none      1      1
+      807      641     23     -16      0      4
+ (output continues ...)
+ > Wrap around
+   334398   329917     47    none      -    225
+   334446   329965     62    none      1      1
+   334539   330058     20    none     31     31
+   334560   330079     92    none      1      1
+   334653   330172     77    none      1      1
+   334740   330259     41    none     10     10
+ (output continues ...)
+ > /home/aphillip/data/GHP.1con  Other matches
+  1317231     4891     21    none      -      -
+  1317275     4927     21    none      -      -
+  1317804     5399     25    none    508    451
+   947580     5436     36    none      -      -
+    23406     5518     34    none      -      -
+   333079     6592     32    none      -      -
+ (output continues ...)
+
+Where the first line is the location of the reference file, and the first three columns are the same
+as the three column match format described in the mummer section. The final three columns are the
+overlap between this match and the previous match, the gap between the start of this match and the
+end of the previous match in the reference, and the gap between the start of this match and the end
+of the previous match in the query respectively.
+
+
+mgaps
+-----
+
+*mgaps* was introduced into the MUMmer pipeline in an effort to better handle large-scale
+rearrangements and duplications. Unlike gaps, mgaps is a full clustering algorithm that is capable
+of generating multiple groups of consistently ordered matches. Clustering is controlled by a set of
+command-line parameters that adjust the minimum cluster size, maximum gap between matches, etc. Only
+matches that were included in clusters will appear in the output, so by adjusting the command-line
+parameters it is possible to filter out many of the spurious matches, thus leaving only the larger
+areas of conservation between the input sequences. The major advantage of mgaps is its ability to
+identify these "islands" of conservation. This frees the user from the single LIS restraints of the
+gaps program and allows for the identification of large-scale rearrangements, duplications, gene
+families and so on.
+
+Gaps can fail to identify clusters because they were not consistent with the LIS. However, by using
+mgaps, all regions of conservation can now been identified. The only fallback being the increased
+complexity of the output, where you once had only one cluster for the whole comparison, you usually
+now get more. Because of this, it can sometimes be difficult separating the repetitive clusters from
+"correct" clusters, *making mgaps more suited for global alignments instead of localized error detection*.
+
+Input can be raw mummer output. *mgaps* is only designed to handle a single reference and one or
+more query sequences, thus the preceding mummer run must also follow this constraint. Please refer
+to the run-mummer3 script (see online manual) for an example of how to use this program in an
+alignment pipeline. Note that in order to cluster reverse complement matches, the reverse complement
+matches must reference the reverse complement strand of the query sequence, therefore forcing mummer
+to be run without the -c option. A rewrite of this algorithm to handle multiple reference sequences
+and a better coordinate system (forward coordinates for reverse complement matches) is doubtful but
+may eventually appear.
+
+The -d option can be interpreted as the number of insertions allowed between two matches in the same
+cluster, while the -f option is a fraction equal to (diagonal difference / match separation) where
+a higher value will increase the indel tolerance. Minimum cluster length is the sum of the contained
+matches unless the -e option is used. The best way to get a feel for what each parameter controls
+is to cluster the same data set numerous times with different values and observe the resulting
+differences. It can also be helpful to set these parameters to the size of the element you wish to
+capture, i.e. set the minimum cluster size to say the smallest exon you expect and set the max gap
+to the smallest intron you expect to obtain clusters that could represent single exons (depending
+of course of the similarity of the two sequences).
+
+Reference: http://mummer.sourceforge.net/manual/#mgaps
+
+**Output format**
+
+Output of *mgaps* shares much in common with the output of mummer and gaps, with a slightly different
+header formatting than gaps to allow for multiple query sequences and multiple clusters. The output
+of mgaps run on both forward and reverse complement matches is as follows:
+::
+
+ > ID41
+ > ID41 Reverse
+  5177399        1    232    none      -      -
+  5177632      234   6794    none      1      1
+  5184433     7035     24    none      7      7
+  5184468     7069     23    none     11     10
+ > ID42
+    10181       43   1521    none      -      -
+ > ID42 Reverse
+  4654536       17     36    none      -      -
+  4654578       57    298    none      6      4
+  4654877      356    226    none      1      1
+ #
+  4655139      845     28    none      -      -
+  4655178      884    694    none     11     11
+  4655873     1579     20    none      1      1
+ #
+  4850044       17   1492    none      -      -
+  4851537     1510    711    none      1      1
+  4852249     2222     42    none      1      1
+ (output continues ...)
+
+
+Headers containing the ID for each query sequence are listed after the '>' characters, and a
+following Reverse keyword identifies the reverse matches for that query sequence. Individual clusters
+for each sequence are separated by a '#' character, and the six columns are exactly the same as the
+gaps output (see the gaps section for more details).
+
+
+|
+|
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_maxmatch.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,170 @@
+<tool id="mummer_maxmatch" name="MUMmer MaxMatch" version="0.9.alx" force_history_refresh="True">
+  <description>: Maximal exact sequence matching</description>
+  <command>
+	<!-- update this path to the installed location -->
+		$tool.cmd
+		#if $tool.cmd=="mummer":
+			$tool.cmd_extra
+			$tool.mum_ref_in
+			$tool.mum_q_in
+		#end if
+		#if $tool.cmd=="repeat-match":
+			-n $tool.rm_n
+			#if $tool.rm_E=="yes":
+				-E
+			#end if
+			$tool.cmd_extra
+			$tool.in_seq
+		#end if
+		#if $tool.cmd=="exact-tandems":
+			$tool.in_seq
+			$tool.et_minl
+		#end if
+		<!-- unfortunate somehow error state gets set also on succesfull jobs. Pipe io stderr to dev/null -->
+		2&gt;&amp;-
+		> $out_tool
+
+  </command>
+	<inputs>
+	  <conditional name="tool">
+		<param name="cmd" type="select" value="mummer" label="MUMmer maximal matching" help="Algorithms are run with default parameters (none). For specific args see help below" >
+			<option value="mummer">mummer</option>
+			<option value="repeat-match">repeat-match</option>
+			<option value="exact-tandems">exact-tandems</option>
+		</param>
+		<when value="mummer">
+			<param name="mum_ref_in" type="data" format="fasta" label="Reference FastA file" />
+			<param name="mum_q_in" type="data" format="fasta" label="Query (multi) FastA sequence" />
+			<param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="See specific cmd line options below for each tool" />
+		</when>
+		<when value="repeat-match">
+			<param name="in_seq" type="data" format="fasta" label="FastA sequence file" />
+			<param name="rm_n" type="text" size="5" value="20" label="Minimum exact match length [-n]" />
+			<param name="rm_E" type="select" value="no" label="Use exhaustive (slow) search to find matches [-E]" >
+				<option value="no">No</option>
+				<option value="yes">Yes</option>
+			</param>
+			<param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="-n and -E are configured above. More specific cmd line options in help below." />
+		</when>
+		<when value="exact-tandems">
+			<param name="in_seq" type="data" format="fasta" label="FastA sequence file" />
+			<param name="et_minl" type="text" size="5" value="20" label="Minimum length" />
+		</when>
+	  </conditional>
+	</inputs>
+	<outputs>
+		<data name="out_tool" format="text" label="Max exact match output" />
+	</outputs>
+    <requirements>
+<!--         <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> -->
+        <requirement type="package" version="4.6.4">gnuplot</requirement>
+        <requirement type="package" version="3.23">MUMmer</requirement>
+    </requirements>
+	<tests>
+		<test>
+		</test>
+	</tests>
+	<help>
+|
+
+
+**Reference**
+=============
+
+- **MUMmer MaxExactMatch Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer suite v3.22:** http://mummer.sourceforge.net
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+Please do not use any of the command line options that modify prefixes or file names. As obvious
+they are quite useless within galaxy and are likely to fail the routine!
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+
+**MUMmer Maximal exact matching**
+=================================
+
+The heart of the MUMmer package is its suffix tree based maximal matching routines. These can be
+used for repeat detection within a single sequence as is done by *repeat-match* and *exact-tandems*,
+or can be used for the alignment of two or more sequences as is done by *mummer*.
+
+Mummer
+------
+
+mummer is a suffix tree algorithm designed to find maximal exact matches of some minimum length
+between two input sequences. by default mummer will only find maximal matches that are unique in
+the entire set of reference sequences. The match lists produced by mummer can be used alone to
+generate alignment dot plots, or can be passed on to the clustering algorithms for the identification
+of longer non-exact regions of conservation. These match lists have great versatility because they
+contain huge amounts of information and can be passed forward to other interpretation programs for
+clustering, analysis, searching, etc.
+
+
+Repeat-match
+------------
+
+repeat-match is a suffix tree algorithm designed to find maximal exact repeats within a single input
+sequence. It uses a similar algorithm to mummer, but altered slightly to find maximal exact matches
+within a single sequence.
+
+Output formatting varies depending on the command line parameters and the output can be quite large.
+The standard output format that results from running repeat-match with default parameters is as follows:
+::
+
+ Long Exact Matches:
+    Start1     Start2    Length
+   4919485    4919506r       22
+
+The three columns are the first position of the repeat, the second position of the repeat, and the
+length of the repeat respectively. Reverse complement repeat positions are denoted by an 'r'
+following the Start2 position, and are relative to the forward strand of the sequence.
+
+
+Exact-tandems
+-------------
+
+exact-tandems is a wrapper script for the repeat-match program. It provides a list of exact tandem
+repeats within a single input sequence. As with repeat-match the sequence file should contain only
+one sequence in FastA format, however if multiple sequences exist the first one will be used. The
+sequence may contain any set of upper and lowercase characters, thus DNA and protein sequence are
+both allowed and matching is case insensitive. The minimum match length parameter should be a
+positive integer, this value will be passed to the repeat-match program via the -n option.
+
+The output format of exact-tandems is as follows:
+::
+
+ Finding matches
+ Tandem repeats
+    Start   Extent  UnitLen     Copies
+   416173      150       45        3.3
+
+The four columns are the first position of the tandem, the extent of the repeat region, the length
+of each tandem repeat unit, and the number of repeat units respectively.
+
+
+
+**Manuals and CMD line options (specific for each tool!):**
+===========================================================
+
+**Mummer**
+
+http://mummer.sourceforge.net/manual/#mummer
+
+**Repeat-match**
+
+http://mummer.sourceforge.net/manual/#repeat
+
+**exact-tandems**
+
+http://mummer.sourceforge.net/manual/#exact
+
+|
+|
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_tool.sh	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,128 @@
+#!/bin/bash
+## use #!/bin/bash -x for debugging
+
+## Galaxy wrapper for MUMmer (nucmer/promer)
+## Alex Bossers, CVI of Wageningen UR, NL
+## alex_dot_bossers_at_wur_dot_nl
+##
+## Sep 2010
+##
+## Wrapper runs MUMmer nucmer/promer and additional args
+## Calculates the comparison scores (delta and optional coords file)
+## Generates the optional STATIC comparison mummerplot to png (from delta file)
+##
+## finally the script renames (optional) output files to outfiles expected by Galaxy
+##
+##
+## INPUT args:
+## nucmer_tool.sh $input_ref $input_query $out_delta $out_coords $out_png $logfile
+##                    @0          @1          @2          @3        @4       @5
+##                $algorithm $keep_delta $make_coords $keep_log $make_image $cmd_extra
+##                     @6        @7           @8          @9        @10         @11
+##
+
+# Function to send error messages.
+log_err() { echo "$@" 1>&2; }
+# path to where mummer suite is installed
+# adjust this for your machine
+# If mummer is available in system path, leave empty
+# when using different path make sure the trailing slash is added.
+# mum_path = /opt/Mummer23/Mummer/
+mum_path=""
+tmp_path="/tmp/mummertmp/"
+
+if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then
+	log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer."
+	exit 127
+fi
+
+# since we have more than 9 arguments we need to shift the sections or use own array
+args=("$@")
+# to keep things readible assign vars
+input_ref="${args[0]}"
+input_query="${args[1]}"
+out_delta="${args[2]}"
+out_coords="${args[3]}"
+out_png="${args[4]}"
+logfile="${args[5]}"
+algorithm="${args[6]}"
+keep_delta="${args[7]}"
+make_coords="${args[8]}"
+keep_log="${args[9]}"
+make_image="${args[10]}"
+cmd_extra="${args[11]}"
+
+[ -d $tmp_path ] || mkdir $tmp_path
+cd $tmp_path
+
+# enable/disable the STDOUT log file
+if [ "$keep_log" == "yes" ]; then
+	logfile_c="2>$logfile"
+	logfile_a="2>>$logfile"
+else
+	#dump to dev/null
+	logfile_c="2>&-"
+	logfile_a="2>&-"
+fi
+
+# extra mummer cmd line options
+
+## generate coords file on the fly?
+if [ "$make_coords" == "yes" ]; then
+	options=" --coords"
+fi
+## extra cmd line args to be concatenated in options? We need to prevent extra spaces!
+if [ "$cmd_extra" != "" ]; then
+	if [ "$options" == "" ]; then
+		options=" $cmd_extra"
+	else
+		options="$options $cmd_extra"
+	fi
+fi
+
+# run nucmer/promer
+# May only run Promer and Nucmer
+echo $algorithm
+if [[ $algorithm =~ ...mer$ ]]; then
+	eval "$mum_path$algorithm$options $input_ref $input_query $logfile_c"
+else
+	log_err 'ERROR, algorithm does not conform to ...mer'
+	exit 1
+fi
+
+
+## generate large png if option make_image = yes
+## suppress error from mummerplot since some is deprecated but not a real error
+## error can be easily avoided by modifying the source of mummerplot... just in case
+## however we need to check if a valid png was generated. This is not the case is alignment is none
+## 1 is stderr and 2 stdout. redirect to dev/null
+if [ "${make_image}" == "yes" ]; then
+	eval "$mum_path mummerplot --large --png out.delta 1>&- $logfile_a"
+	if [ -f "out.png" ]; then
+		mv out.png $out_png
+		#cleanup temp gnuplot file
+		rm out.gp
+	else
+		log_err "not exist the req png file!"
+		exit 1
+	fi
+
+	## clean up remaining files
+	rm out.fplot
+	rm out.rplot
+
+fi
+
+# keep/rename or delete delta file
+if [ "$keep_delta" == "yes" ]; then
+	mv out.delta "$out_delta"
+else
+	rm out.delta
+fi
+
+# keep/rename coords file if it was created
+if [ "$make_coords" == "yes" ]; then
+	mv out.coords "$out_coords"
+fi
+# end script
+exit 0
\ No newline at end of file
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_tool.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,114 @@
+<tool id="mummer_tool" name="MUMmer compare and plot" version="0.4.alx" force_history_refresh="True">
+  <description>: Compare and plot genomes (Nucmer or Promer)</description>
+  <command interpreter="bash">
+  		mummer_tool.sh
+		$input_ref $input_query
+		$out_delta $out_coords $out_png $out_log
+		$algorithm
+		$keep_delta $make_coords $keep_log $make_image
+		$cmd_extra
+  </command>
+	<inputs>
+		<param name="algorithm" type="select" format="text" value="nucmer" label="Algorithm" help="Nucmer dna or Promer protein (FASTA: protein. Dna is six frame translated)">
+			<option value="nucmer">Nucmer DNA</option>
+			<option value="promer">Promer</option>
+		</param>
+		<param name="input_ref" type="data" format="fasta" label="Reference sequence" />
+		<param name="input_query" type="data" format="fasta" label="Sequence query file"/>
+		<param name="make_image" type="select" format="text" value="yes" label="Generate MUMmerplot" help="MUMmerplot will be run with default settings and --large --png as fixed image.">
+			<option value="yes">Yes</option>
+			<option value="no">No</option>
+		</param>
+		<param name="keep_delta" type="select" format="text" value="no" label="Keep delta file" help="i.e. for further processing">
+			<option value="no">No</option>
+			<option value="yes">Yes</option>
+		</param>
+		<param name="make_coords" type="select" format="text" value="yes" label="Make coords file" help="Uses the -r argument to sort lines by reference.">
+			<option value="no">No</option>
+			<option value="yes">Yes</option>
+		</param>
+		<param name="keep_log" type="select" format="text" value="no" label="Keep console log file" help="i.e. for debugging">
+			<option value="no">No</option>
+			<option value="yes">Yes</option>
+		</param>
+		<param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="the --coords is run by default" />
+	</inputs>
+	<outputs>
+		<data name="out_coords" format="tabular" label="${algorithm.value_label} coords">
+			<filter>make_coords=="yes"</filter>
+		</data>
+		<data name="out_delta" format="tabular" label="${algorithm.value_label} delta">
+			<filter>keep_delta=="yes"</filter>
+		</data>
+		<data name="out_png" format="png" label="${algorithm.value_label} mummerplot">
+			<filter>make_image=="yes"</filter>
+		</data>
+		<data name="out_log" format="tabular" label="Console log file">
+			<filter>keep_log=="yes"</filter>
+		</data>
+	</outputs>
+    <requirements>
+        <requirement type="package" version="4.6">gnuplot</requirement>
+        <requirement type="package" version="3.23">MUMmer</requirement>
+    </requirements>
+     <tests>
+    	<test>
+	      <param name="algorithm" value="nucmer" />
+	      <param name="input_ref" value="test.seq1.fasta"/>
+		  <param name="input_query" value="test.seq2.fasta"/>
+	      <param name="make_image" value="yes"/>
+		  <param name="keep_delta" value="no" />
+		  <param name="make_coords" value="no" />
+		  <param name="keep_log" value="yes" />
+
+	      <output name="out_log" file="test.MUMmerplot.result.Nucmer_galaxy.log" />
+	      <output name="out_png" file="test.MUMmerplot.result.Nucmer_galaxy.png" />
+    	</test>
+	</tests>
+	<help>
+|
+
+
+**Reference**
+-------------
+
+- **Nucmer Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands.**
+
+- **Nucmer or Promer of MUMmer suite:** v3.22 http://mummer.sourceforge.net/manual/
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+
+If you found these tools/wrappers useful in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**Command line arguments**
+--------------------------
+
+--mum  Use anchor matches that are unique in both the reference and query
+--mumreference  Use anchor matches that are unique in the reference but not necessarily unique in the query (default behavior)
+--maxmatch  Use all anchor matches regardless of their uniqueness
+--breaklen  Distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200)
+--mincluster  Minimum cluster length (default 65)
+--delta  Toggle the creation of the delta file. Setting --nodelta prevents the alignment extension step and only outputs the match clusters (default --delta)
+--depend  Print the dependency information and exit
+--diagfactor  Maximum diagonal difference factor for clustering, i.e. diagonal difference / match separation (default 0.12)
+--extend  Toggle the outward extension of alignments from their anchoring clusters. Setting --noextend will prevent alignment extensions but still align the DNA between clustered matches and create the .delta file (default --extend)
+--forward  Align only the forward strands of each sequence
+--maxgap  Maximum gap between two adjacent matches in a cluster (default 90)
+--help  Print the help information and exit
+--minmatch  Minimum length of an maximal exact match (default 20)
+--optimize  Toggle alignment score optimization. Setting --nooptimize will prevent alignment score optimization and result in sometimes longer, but lower scoring alignments (default --optimize)
+--reverse  Align only the reverse strand of the query sequence to the forward strand of the reference
+--simplify  Simplify alignments by removing shadowed clusters. Turn this option off (--nosimplify) if aligning a sequence to itself to look for repeats (default --simplify)
+--version  Print the version information and exit
+--coords  **Automatically ON in galaxy wrapper!** It generates the .coords file using the 'show-coords' program with the -r option.
+--prefix  **Do NOT use in Galaxy wrapper!** Set the output file prefix (default out)
+
+|
+|
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_utilities_tool.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,184 @@
+<tool id="mummer_utilities_tool" name="MUMmer utilities" version="0.9.alx" force_history_refresh="True">
+  <description>: Show and filter on sequence delta file</description>
+  <command>
+  	<!-- update this path to the installed location -->
+		$tool.cmd
+		$cmd_extra
+		$input_delta
+		#if $tool.cmd=="show-aligns":
+			$tool.aligns1
+			$tool.aligns2
+		#end if
+		> $out_tool
+  </command>
+	<inputs>
+	  <conditional name="tool">
+		<param name="cmd" type="select" value="show-snps" label="MUMmer utility" help="Utilities are run with default parameters (none). For utility specific args see help below" >
+			<option value="show-snps">show SNPs</option>
+			<option value="show-tiling">show tiling</option>
+			<option value="show-diff">show diff</option>
+			<option value="show-coords">show coords</option>
+			<option value="show-aligns">show aligns</option>
+			<option value="delta-filter">delta filter</option>
+		</param>
+		<when value="show-aligns">
+			<param name="aligns1" type="text" size="40" value="" label="IdR" help="the FastA header tag of the desired reference sequence" />
+			<param name="aligns2" type="text" size="40" value="" label="IdQ" help="the FastA header tag of the desired query sequence" />
+		</when>
+		<when value="show-snps" />
+		<when value="show-tiling" />
+		<when value="show-coords" />
+		<when value="show-diff" />
+		<when value="delta-filter" />
+ 	  </conditional>
+		<param name="input_delta" type="data" format="tabular" label="MUMmer delta file" />
+		<param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="see specific cmd line options below for each tool" />
+	</inputs>
+	<outputs>
+		<data name="out_tool" format="text" />
+	</outputs>
+    <requirements>
+<!--         <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> -->
+        <requirement type="package" version="4.6.4">gnuplot</requirement>
+        <requirement type="package" version="3.23">MUMmer</requirement>
+    </requirements>
+	<tests>
+		<test>
+		</test>
+	</tests>
+	<help>
+|
+
+
+**Reference**
+=============
+
+- **MUMmer_utilities Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer utilities running on MUMmer delta file:** http://mummer.sourceforge.net/manual
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmer Utilities**
+====================
+
+All tools are using the MUMmer generated DELTA file! Additional arguments are only required for show-aligns.
+
+Show-coords
+-----------
+
+show-coords parses the delta alignment output of NUCmer and PROmer, and displays summary
+information such as position, percent identity and so on, of each alignment. It is the most
+commonly used tool for analyzing the delta files. *Usually the -r is used to sort lines by reference*
+
+
+Show-tiling
+-----------
+
+show-tiling attempts to construct a tiling path out of the query contigs as mapped to the reference
+sequences. Given the delta alignment information of a few long reference sequences and many small
+query contigs, show-tiling will determine the best mapped location of each query contig. Note that
+each contig may only be tiled once, so repetitive regions may cause this program some difficulty.
+This program is useful for aiding in the scaffolding and closure of an unfinished set of contigs,
+if a suitable, high similarity reference genome is available. Or, if using PROmer, show-tiling will
+help in the identification of syntenic regions and their contig's mapping to the references.
+
+This program is not suitable for "many vs. many" assembly comparisons, however a new tool based on
+the concepts of show-tiling should be available in the near future that will facilitate the mapping
+of assembly contigs.
+
+
+Show-snps
+---------
+
+show-snps is a utility program for reporting polymorphisms contained in a delta encoded alignment
+file output by NUCmer or PROmer. It catalogs all of the single nucleotide polymorphisms (SNPs) and
+insertions/deletions within the delta file alignments. Polymorphisms are reported one per line, in
+a delimited fashion similar to show-coords. Pairing this program with the appropriate MUMmer tools
+can create an easy to use SNP pipeline for the rapid identification of putative SNPs between any
+two sequence sets, as demonstrated in the manual SNP detection section.
+
+
+Show-diff
+---------
+
+Outputs a list of structural differences for each sequence in
+the reference and query, sorted by position. For a reference
+sequence R, and its matching query sequence Q, differences are
+categorized as GAP (gap between two mutually consistent alignments),
+DUP (inserted duplication), BRK (other inserted sequence), JMP
+(rearrangement), INV (rearrangement with inversion), SEQ
+(rearrangement with another sequence). The first five columns of
+the output are seq ID, feature type, feature start, feature end,
+and feature length. Additional columns are added depending on the
+feature type. Negative feature lengths indicate overlapping adjacent
+alignment blocks.
+::
+
+  IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
+  IDR DUP dup-start dup-end dup-length
+  IDR BRK gap-start gap-end gap-length
+  IDR JMP gap-start gap-end gap-length
+  IDR INV gap-start gap-end gap-length
+  IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
+
+Positions always reference the sequence with the given ID. The
+sum of the fifth column (ignoring negative values) is the total
+amount of inserted sequence. Summing the fifth column after removing
+DUP features is total unique inserted sequence. Note that unaligned
+sequence are not counted, and could represent additional "unique"
+sequences. See documentation for tips on how to interpret these
+alignment break features.
+
+
+Show-aligns
+-----------
+
+show-aligns parses the delta encoded alignment output of NUCmer and PROmer, and displays
+the pair-wise alignments from the two sequences specified on the command line. It is handy
+for identifying the exact location of errors and looking for SNPs between two sequences.
+
+
+Delta-filter
+------------
+
+delta-filter is a utility program for the manipulation of the delta encoded alignment files output
+by the NUCmer and PROmer pipelines. It takes a delta file as input and filters the information based
+on the various command line switches, outputting only the desired alignments to stdout. Options to filter by
+alignment length, identity, uniqueness and consistency are provided. Certain combinations of these
+options can greatly reduce the number of unwanted alignments in the delta file, thus making the output
+of programs such as show-coords more comprehendible.
+
+
+
+**CMD line options (specific for each tool!):**
+===============================================
+
+**Show-coords**
+
+http://mummer.sourceforge.net/manual/#coords
+
+**Show-tiling**
+
+http://mummer.sourceforge.net/manual/#tiling
+
+**Show-snps**
+
+http://mummer.sourceforge.net/manual/#snps
+
+**Show-aligns**
+
+http://mummer.sourceforge.net/manual/#aligns
+
+**Delta-filter**
+
+http://mummer.sourceforge.net/manual/#filter
+
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummerplot_tool.sh	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+## simple bash to generate mummerplot of MATCH file
+##
+## Galaxy wrapper by Alex Bossers, CVI of Wageningen UR, Lelystad, NL
+## alex_dot_bossers_at_wur_dot_nl
+##
+##
+## needs a rename of the fixed name to something recognised by galaxy
+## needs cleanout of temp files
+##
+## call is mummerplot $format  $in_match $out_file $cmd_extra
+##             $0        $1         $2       $3       $4
+##
+## since mummerplot uses some deprecated syntax which can be fixed in the source
+## we redirect STDERR to dev/null to circumvent errorstatus in galaxy
+## io redirects 0=stdin 1=stdout 2=stderr to dev/null (or &-)
+
+# Function to send error messages.
+log_err() { echo "$@" 1>&2; }
+
+# path to where mummer suite is installed
+# adjust this for your machine
+# this is the only hard coded path in the scripts
+mum_path=""
+
+if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then
+	log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer."
+	exit 127
+fi
+
+# some default options to generate a LARGE fixed PNG/POSTSCRIPT image and not an interactive one.
+
+if [ "$1" = "png" ]; then
+	extension="png"
+else
+	extension="ps"
+fi
+
+eval "$mum_path mummerplot --large --$1 $2 1>&- 2>&-"
+if [ -f "out.$extension" ]; then
+	#conditional move to something known by galaxy
+	mv out.$extension $3
+	#remove gnuplot file
+	rm out.gp
+fi
+
+## clean up
+rm out.fplot
+rm out.rplot
+
+#end script
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummerplot_tool.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,115 @@
+<tool id="mummerplot_tool" name="MUMmer plot" version="1.0.1" force_history_refresh="true">
+  <description>: Generate MUMmerplots from MUMmer match file</description>
+  <command interpreter="bash">
+  		mummerplot_tool.sh
+		#if $img_format=="png":
+			png $input_match $out_png
+		#else:
+			postscript $input_match $out_postscript
+		#end if
+		$cmd_extra
+  </command>
+	<inputs>
+		<param name="input_match" type="data" format="tabular" label="MUMmer match (delta or tiling) file" />
+		<!-- <conditional name="outType"> -->
+			<param name="img_format" type="select" label="Output format" >
+				<option value="png" selected="true">PNG image</option>
+				<option value="postscript">Postscript</option>
+			</param>
+		<!--  </conditional>  -->
+		<param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="See cmd line options below" />
+	</inputs>
+	<outputs>
+		<data name="out_png" format="png" label="MUMmerplot png">
+			<filter>img_format=="png"</filter>
+		</data>
+		<data name="out_postscript" format="ps" label="MUMmerplot ps">
+			<filter>img_format=="postscript"</filter>
+		</data>
+	</outputs>
+    <requirements>
+<!--         <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> -->
+        <requirement type="package" version="4.6.4">gnuplot</requirement>
+        <requirement type="package" version="3.23">MUMmer</requirement>
+    </requirements>
+	<tests>
+		<test>
+		</test>
+	</tests>
+	<help>
+|
+
+
+**Reference**
+=============
+
+- **MUMmerplot Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands**
+
+- **MUMmerplot running on MUMmer-match file:** http://mummer.sourceforge.net/manual#mummerplot
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmerplot**
+==============
+
+| This plotting tool requires a MUMmer match file (either the delta file or the tiling result file)!
+| MUMmerplot requires gnuplot (www.gnuplot.info) to be installed.
+|
+| **The plotting has by default set the arguments --large and --png/--postscript to generate a fixed image instead of an interactive view!** Optional cmd line arguments can be used.
+|
+
+
+
+Mummerplot is a script utility that takes output from *MUMmer, nucmer or promer* as DELTA file, or the
+*show-tiling* result file, and converts it to a format suitable for plotting with gnuplot. The primary
+plot type is an alignment dotplot where a sequence is laid out on each axis and a point is plotted at
+every position where the two sequences show similarity. As an extension to this plot style, mummerplot
+is also able to offset multiple 1-vs-1 dotplots to form a multiplot where multiple sequences can be
+laid out on each axis. This plot style is especially handy for browsing an alignment of two contig
+sets. Identity plots are also possible by coloring each data point with a color gradient representing
+identity, or by collapsing the y-axis data onto a single line and then vertically offsetting the
+data points by their identities. In addition to producing the plot data, mummerplot also generates a
+gnuplot script that will be evaluated in order to generate the graph.
+
+
+The *match file* can either be a three column match list from mummer (either 3 or 4 column format),
+the delta file from nucmer or promer, or the default output from show-tiling. mummerplot will
+automatically detect the type of input file it is given, regardless of its file extension, or it
+will fail if the input file is of an unrecognized type.
+
+
+
+Optional command line arguments
+-------------------------------
+
+--breaklen  Highlight alignments with a breakpoint further than the given distance from the nearest sequence end
+--nocolor  Color plot lines with a percent similarity gradient or turn off all color (default color by match direction)
+--coverage  Generate a reference coverage plot, also known as a percent identity plot (default behavior for show-tiling input)
+--depend  Print dependency information and exit
+--filter  Only display alignments which represent the "best" one-to-one mapping of reference and query subsequences (requires delta formatted input)
+--help  Print help information and exit
+--layout  Layout a multiplot by ordering and orienting sequences such that the largest hits cluster near the main diagonal (requires delta formatted input)
+--prefix  *do not use in galaxy!* Set the output file prefix (default 'out')
+--rv  Reverse video, swap the foreground and background colors for x11 plots (requires x11 terminal)
+--IdR  Select a specific reference sequence for the x-axis
+--IdQ  Select a specific query sequence for the y-axis
+--Rfile  Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired reference sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-]
+--Qfile  Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired query sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-]
+--size  Set the output size to small, medium or large
+--large  **default enabled to generate highres image**. Other sizes no effect: --small  --medium --large
+--SNP  Highlight SNP locations in the alignment
+--terminal  *do not use in galaxy* Set the output terminal to x11, postscript or png
+--png  **either png or postscript for fixed image**. Other interactive x11 not enabled
+--postscript  Alternate output format instead of png.
+--xrange  Set the x-range for the plot in the form "[min,max]"
+--yrange  Set the y-range for the plot in the form "[min,max]"
+--version  Display version information and exit
+
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/nucmer_coords2ACT_galaxy.pl	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,42 @@
+#!/usr/bin/perl
+
+# converts the MUMmer-nucmer coords file in a file readable for Artemis Comparison Tool
+# Output format is like crunch of BLAST
+#
+# [nov 2010] Galaxy wrapped up version
+#
+# Alex.Bossers@wur.nl
+
+
+use warnings;
+use strict;
+
+#$filename=shift;
+   #$ARGV[0] =~ m/^([A-Z0-9_.-]+)$/ig;
+my $filename = $ARGV[0];
+   #$ARGV[1] =~ m/^([A-Z0-9_.-]+)$/ig;
+my $fileout = $ARGV[1];
+#my $filename	=	"Curated_vs_noncurated_8067_01.nucmer.coords";
+#my $fileout	=	"Curated_vs_noncurated_8067_01.nucmer.tab";
+
+open (COORDS,$filename) || die "error opening input coords file";
+open (OUT,">$fileout") || die "error opening tab output file";
+
+while (<COORDS>)
+         {
+    unless ($_ =~ /^(\s*)\d/){next}
+    $_ =~ s/\|//g;
+
+    my @f = split;
+          # create crude match score = ((length_of_match * %identity)-(length_of_match * (100 - %identity))) /20
+    my $crude_plus_score=($f[4]*$f[6]);
+    my $crude_minus_score=($f[4]*(100-$f[6]));
+    my $crude_score=  int(($crude_plus_score  - $crude_minus_score) / 20);
+          # reorganise columns and print crunch format to stdout
+          # score        %id   S1    E1    seq1  S2    E2    seq2  (description)
+    print OUT " $crude_score $f[6] $f[0] $f[1] $f[7] $f[2] $f[3] $f[8] nucmer comparison coordinates\n"
+         }
+
+close (COORDS);
+close (OUT);
+print "Done!\n\n";
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/nucmer_coords2ACT_galaxy.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,39 @@
+<tool id="MUMmer2ACT_tool" name="MUMmer2ACT" version="0.1.alx" force_history_refresh="True">
+  <description>: convert MUMmer comparison (coords) file to ACT (Artemis)</description>
+  <command interpreter="perl">
+  	nucmer_coords2ACT_galaxy.pl $in_coords $out_act
+  </command>
+	<inputs>
+		<param name="in_coords" type="data" format="tabular" label="MUMmer coords file to use" help="i.e. a nucmer comparison (coords) file" />
+	</inputs>
+	<outputs>
+		<data name="out_act" format="tabular" label="ACT conversion of coords" />
+	</outputs>
+	<requirements>
+	  <!-- <requirement type="perl-script">nucmer_coords2ACT_galaxy.pl</requirement> -->
+	</requirements>
+	<tests>
+		<test>
+		</test>
+	</tests>
+	<help>
+|
+|
+
+**Info**
+--------
+
+This tool will convert the MUMmer comparison file (run MUMmer with the coords option) into a "blast crunch" file
+that can be read as a comparison file in Artemic Comparison Tool (ACT).
+
+It will output a single tabular crunch file (save as extension .tab on windows systems).
+
+**Reference/questions/remarks**
+
+- *Conversion perl script and wrapper:* Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+
+
+	</help>
+</tool>
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/suite_config.xml	Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,22 @@
+<suite id="MUMmer_toolsuite" name="Suite of MUMmer tools" version="1.0.0">
+	<description>This suite contains MUMmer genome alignment tools and parsers</description>
+	<tool id="mummer_tool" name="MUMmer" version="0.4.alx">
+		<description>: Compare genomes by alignment (Nucmer or Promer)</description>
+	</tool>
+	<tool id="mummer_maxmatch" name="MUMmer MaxMatch" version="0.9.alx" >
+		<description>: Maximal exact sequence matching</description>
+	</tool>
+	<tool id="mummer_clustering" name="MUMmer Clustering" version="0.9.alx">
+		<description>: order sequence matches in clusters</description>
+	</tool>
+	<tool id="mummer_utilities_tool" name="MUMmer utilities" version="0.9.alx">
+		<description>: Show and filter on sequence delta file</description>
+	</tool>
+	<tool id="mummerplot_tool" name="MUMmer plot" version="1.0.1">
+		<description>: Generate MUMmerplots from MUMmer match file</description>
+	</tool>
+	<tool id="MUMmer2ACT_tool" name="MUMmer2ACT" version="0.1.alx">
+		<description>: convert MUMmer comparison (coords) file to ACT (Artemis)</description>
+	</tool>
+</suite>
+