# HG changeset patch
# User abossers
# Date 1414511973 -3600
# Node ID 479eb076cd236d43591a90ef9407f719dd629631
# Parent c1c38335322e5fd7d221bd4217258b8f19149c61
Add revised mummer toolshed files to testtoolshed
diff -r c1c38335322e -r 479eb076cd23 MUMmer/README_mummer
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/README_mummer Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,56 @@
+# Created/shared May 2011
+#
+# Alex Bossers
+# Central Veterinary Institute
+# Wageningen University and Research centre
+# Lelystad, The Netherlands
+#
+# Comments/improvements/bugs: Alex (dot) Bossers (at) wur (dot) nl
+
+
+# WHAT IT DOES
+The MUMmer suite is a set of very basic wrappers for the MUMmer genome comparison tools. Most common operations should be possible
+by using these wrappers. MUMmer works fast on smaller (bacterial) genomes but can also cope with eukaryotic genomes.
+
+In addition to the original MUMmer tools it also contains an additional conversion script to convert MUMmer comparison files,
+the so-called coords files into a readible format for Artemis Comparison Tool (ACT; Sanger UK).
+
+
+# REQUIREMENTS
+- Perl
+- Galaxy :)
+- MUMmer newer than version 3.20;
+ even though older versions might work as well.
+ Get your MUMmer here: http://mummer.sourceforge.net/
+ Make sure MUMmer is in your PATH and/or update the tool xml configs and wrappers for the full MUMmer path
+ if it is different from /opt/MUMmer/MUMmer.
+- ACT can be run locally or via Webstart if you want to visualise genome comparisons in detail: http://www.sanger.ac.uk/resources/software/act
+- GNUplot is a requirement for the MUMmerplot part (see MUMmer installation documentation)
+
+
+# SETUP
+Just unpack the tool xml and perl script somewhere appropriate and adapt the MUMmer installation part if different from above. Plug the tool in the tool_config.xml
+of your galaxy instance and refresh the tools or restart the galaxy server.
+
+
+# TESTING
+You can test the code by running Nucmer on the test data and visualise the results in MUMmerplot.
+It should return a MUMmerplot identical to the image provided. For reference I also included the corresponding log file.
+
+
+# LICENSE
+Copyright (c) 2011 Central Veterinary Institute of Wageningen UR, Lelystad, The Netherlands.
+MUMmer is copyright by its respective owner. See their licensing details.
+
+Our wrappers/programs are free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3 of the License, or
+(at your option) any later version.
+
+When distributing the tools please include this original reference.
+
+Use this tool at your own risk. Even though we tried to build tools and wrappers that free of errors,
+check your output since it might be erroneous. We will not be relyable to any failure this may have caused.
+
+If you like these scripts, please acknowledge our work.
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummer_clustering.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_clustering.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,238 @@
+
+ : order sequence matches in clusters
+
+
+ $tool.cmd
+ #if $tool.cmd=="gaps":
+ $in_reference
+ #if $tool.gaps_r=="yes":
+ -r
+ #end if
+ #end if
+ #if $tool.cmd=="mgaps":
+ #if $tool.cmd_C=="yes":
+ -C
+ #end if
+ -d $tool.cmd_d
+ #if $tool.cmd_e=="yes":
+ -e
+ #end if
+ -f $tool.cmd_f
+ -l $tool.cmd_l
+ -s $tool.cmd_s
+ #end if
+ < $tool.in_match_list
+ > $out_tool
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ gnuplot
+ MUMmer
+
+
+
+
+
+
+|
+
+
+**Reference**
+=============
+
+- **MUMmer clustering Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer suite v3.22:** http://mummer.sourceforge.net
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmer Clustering**
+=====================
+
+MUMmer's clustering algorithms attempt to order small individual matches into larger match clusters
+in order to make the output of mummer more intelligible. A dot plot makes it easy to spot alignment
+regions from a match list, however when examining the data without graphic aids, it is very difficult
+to draw any reasonable conclusions from the simple flat file list of matches. Clustering the matches
+together into larger groups of neighboring matches makes this process much easier by ordering the
+data and removing spurious matches.
+
+
+Gaps
+----
+
+*gaps* is the primary clustering algorithm for run-mummer1, and although classified as a "clustering"
+step, gaps is more of a sorting routine. It implements the LIS (longest increasing subset) algorithm
+to extract the longest consistent set of matches between two sequences, and generates a single
+cluster that represents the best "straight-line" arrangement of matches between the sequences. By
+straight-line, we mean no rearrangements or inversions, just a simple path of agreeing matches
+between the two sequences. This limits the usability of this program to the alignment of genomes
+that are very similar and with no large scale mutations. *gaps* is best suited for the comparison of
+near identical sequences with the goal of finding minor mutations like SNPs and small indels.
+
+Input can be filtered mummer output. The strange syntax is a result of a legacy issue described in
+the Known problems (manual) section, and requires the header be stripped from the mummer output. In
+addition, gaps is only designed to handle a single reference and a single query sequence, thus the
+preceding mummer run must also follow this constraint. The -r is optional and designates the incoming
+matches as reverse complement matches which must reference the reverse complement of the sequence,
+therefore forcing mummer to be run without the -c option.
+
+Reference: http://mummer.sourceforge.net/manual/#gaps
+
+**Output:**
+::
+
+ > /home/aphillip/data/GHP.1con Consistent matches
+ 183 17 22 none - -
+ 238 72 108 none 33 33
+ 347 181 92 none 1 1
+ 458 292 50 none 19 19
+ 705 539 44 none 1 1
+ 750 584 38 none 1 1
+ 807 641 23 -16 0 4
+ (output continues ...)
+ > Wrap around
+ 334398 329917 47 none - 225
+ 334446 329965 62 none 1 1
+ 334539 330058 20 none 31 31
+ 334560 330079 92 none 1 1
+ 334653 330172 77 none 1 1
+ 334740 330259 41 none 10 10
+ (output continues ...)
+ > /home/aphillip/data/GHP.1con Other matches
+ 1317231 4891 21 none - -
+ 1317275 4927 21 none - -
+ 1317804 5399 25 none 508 451
+ 947580 5436 36 none - -
+ 23406 5518 34 none - -
+ 333079 6592 32 none - -
+ (output continues ...)
+
+Where the first line is the location of the reference file, and the first three columns are the same
+as the three column match format described in the mummer section. The final three columns are the
+overlap between this match and the previous match, the gap between the start of this match and the
+end of the previous match in the reference, and the gap between the start of this match and the end
+of the previous match in the query respectively.
+
+
+mgaps
+-----
+
+*mgaps* was introduced into the MUMmer pipeline in an effort to better handle large-scale
+rearrangements and duplications. Unlike gaps, mgaps is a full clustering algorithm that is capable
+of generating multiple groups of consistently ordered matches. Clustering is controlled by a set of
+command-line parameters that adjust the minimum cluster size, maximum gap between matches, etc. Only
+matches that were included in clusters will appear in the output, so by adjusting the command-line
+parameters it is possible to filter out many of the spurious matches, thus leaving only the larger
+areas of conservation between the input sequences. The major advantage of mgaps is its ability to
+identify these "islands" of conservation. This frees the user from the single LIS restraints of the
+gaps program and allows for the identification of large-scale rearrangements, duplications, gene
+families and so on.
+
+Gaps can fail to identify clusters because they were not consistent with the LIS. However, by using
+mgaps, all regions of conservation can now been identified. The only fallback being the increased
+complexity of the output, where you once had only one cluster for the whole comparison, you usually
+now get more. Because of this, it can sometimes be difficult separating the repetitive clusters from
+"correct" clusters, *making mgaps more suited for global alignments instead of localized error detection*.
+
+Input can be raw mummer output. *mgaps* is only designed to handle a single reference and one or
+more query sequences, thus the preceding mummer run must also follow this constraint. Please refer
+to the run-mummer3 script (see online manual) for an example of how to use this program in an
+alignment pipeline. Note that in order to cluster reverse complement matches, the reverse complement
+matches must reference the reverse complement strand of the query sequence, therefore forcing mummer
+to be run without the -c option. A rewrite of this algorithm to handle multiple reference sequences
+and a better coordinate system (forward coordinates for reverse complement matches) is doubtful but
+may eventually appear.
+
+The -d option can be interpreted as the number of insertions allowed between two matches in the same
+cluster, while the -f option is a fraction equal to (diagonal difference / match separation) where
+a higher value will increase the indel tolerance. Minimum cluster length is the sum of the contained
+matches unless the -e option is used. The best way to get a feel for what each parameter controls
+is to cluster the same data set numerous times with different values and observe the resulting
+differences. It can also be helpful to set these parameters to the size of the element you wish to
+capture, i.e. set the minimum cluster size to say the smallest exon you expect and set the max gap
+to the smallest intron you expect to obtain clusters that could represent single exons (depending
+of course of the similarity of the two sequences).
+
+Reference: http://mummer.sourceforge.net/manual/#mgaps
+
+**Output format**
+
+Output of *mgaps* shares much in common with the output of mummer and gaps, with a slightly different
+header formatting than gaps to allow for multiple query sequences and multiple clusters. The output
+of mgaps run on both forward and reverse complement matches is as follows:
+::
+
+ > ID41
+ > ID41 Reverse
+ 5177399 1 232 none - -
+ 5177632 234 6794 none 1 1
+ 5184433 7035 24 none 7 7
+ 5184468 7069 23 none 11 10
+ > ID42
+ 10181 43 1521 none - -
+ > ID42 Reverse
+ 4654536 17 36 none - -
+ 4654578 57 298 none 6 4
+ 4654877 356 226 none 1 1
+ #
+ 4655139 845 28 none - -
+ 4655178 884 694 none 11 11
+ 4655873 1579 20 none 1 1
+ #
+ 4850044 17 1492 none - -
+ 4851537 1510 711 none 1 1
+ 4852249 2222 42 none 1 1
+ (output continues ...)
+
+
+Headers containing the ID for each query sequence are listed after the '>' characters, and a
+following Reverse keyword identifies the reverse matches for that query sequence. Individual clusters
+for each sequence are separated by a '#' character, and the six columns are exactly the same as the
+gaps output (see the gaps section for more details).
+
+
+|
+|
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummer_maxmatch.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_maxmatch.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,170 @@
+
+ : Maximal exact sequence matching
+
+
+ $tool.cmd
+ #if $tool.cmd=="mummer":
+ $tool.cmd_extra
+ $tool.mum_ref_in
+ $tool.mum_q_in
+ #end if
+ #if $tool.cmd=="repeat-match":
+ -n $tool.rm_n
+ #if $tool.rm_E=="yes":
+ -E
+ #end if
+ $tool.cmd_extra
+ $tool.in_seq
+ #end if
+ #if $tool.cmd=="exact-tandems":
+ $tool.in_seq
+ $tool.et_minl
+ #end if
+
+ 2>&-
+ > $out_tool
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ gnuplot
+ MUMmer
+
+
+
+
+
+
+|
+
+
+**Reference**
+=============
+
+- **MUMmer MaxExactMatch Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer suite v3.22:** http://mummer.sourceforge.net
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+Please do not use any of the command line options that modify prefixes or file names. As obvious
+they are quite useless within galaxy and are likely to fail the routine!
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+
+**MUMmer Maximal exact matching**
+=================================
+
+The heart of the MUMmer package is its suffix tree based maximal matching routines. These can be
+used for repeat detection within a single sequence as is done by *repeat-match* and *exact-tandems*,
+or can be used for the alignment of two or more sequences as is done by *mummer*.
+
+Mummer
+------
+
+mummer is a suffix tree algorithm designed to find maximal exact matches of some minimum length
+between two input sequences. by default mummer will only find maximal matches that are unique in
+the entire set of reference sequences. The match lists produced by mummer can be used alone to
+generate alignment dot plots, or can be passed on to the clustering algorithms for the identification
+of longer non-exact regions of conservation. These match lists have great versatility because they
+contain huge amounts of information and can be passed forward to other interpretation programs for
+clustering, analysis, searching, etc.
+
+
+Repeat-match
+------------
+
+repeat-match is a suffix tree algorithm designed to find maximal exact repeats within a single input
+sequence. It uses a similar algorithm to mummer, but altered slightly to find maximal exact matches
+within a single sequence.
+
+Output formatting varies depending on the command line parameters and the output can be quite large.
+The standard output format that results from running repeat-match with default parameters is as follows:
+::
+
+ Long Exact Matches:
+ Start1 Start2 Length
+ 4919485 4919506r 22
+
+The three columns are the first position of the repeat, the second position of the repeat, and the
+length of the repeat respectively. Reverse complement repeat positions are denoted by an 'r'
+following the Start2 position, and are relative to the forward strand of the sequence.
+
+
+Exact-tandems
+-------------
+
+exact-tandems is a wrapper script for the repeat-match program. It provides a list of exact tandem
+repeats within a single input sequence. As with repeat-match the sequence file should contain only
+one sequence in FastA format, however if multiple sequences exist the first one will be used. The
+sequence may contain any set of upper and lowercase characters, thus DNA and protein sequence are
+both allowed and matching is case insensitive. The minimum match length parameter should be a
+positive integer, this value will be passed to the repeat-match program via the -n option.
+
+The output format of exact-tandems is as follows:
+::
+
+ Finding matches
+ Tandem repeats
+ Start Extent UnitLen Copies
+ 416173 150 45 3.3
+
+The four columns are the first position of the tandem, the extent of the repeat region, the length
+of each tandem repeat unit, and the number of repeat units respectively.
+
+
+
+**Manuals and CMD line options (specific for each tool!):**
+===========================================================
+
+**Mummer**
+
+http://mummer.sourceforge.net/manual/#mummer
+
+**Repeat-match**
+
+http://mummer.sourceforge.net/manual/#repeat
+
+**exact-tandems**
+
+http://mummer.sourceforge.net/manual/#exact
+
+|
+|
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummer_tool.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_tool.sh Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,128 @@
+#!/bin/bash
+## use #!/bin/bash -x for debugging
+
+## Galaxy wrapper for MUMmer (nucmer/promer)
+## Alex Bossers, CVI of Wageningen UR, NL
+## alex_dot_bossers_at_wur_dot_nl
+##
+## Sep 2010
+##
+## Wrapper runs MUMmer nucmer/promer and additional args
+## Calculates the comparison scores (delta and optional coords file)
+## Generates the optional STATIC comparison mummerplot to png (from delta file)
+##
+## finally the script renames (optional) output files to outfiles expected by Galaxy
+##
+##
+## INPUT args:
+## nucmer_tool.sh $input_ref $input_query $out_delta $out_coords $out_png $logfile
+## @0 @1 @2 @3 @4 @5
+## $algorithm $keep_delta $make_coords $keep_log $make_image $cmd_extra
+## @6 @7 @8 @9 @10 @11
+##
+
+# Function to send error messages.
+log_err() { echo "$@" 1>&2; }
+# path to where mummer suite is installed
+# adjust this for your machine
+# If mummer is available in system path, leave empty
+# when using different path make sure the trailing slash is added.
+# mum_path = /opt/Mummer23/Mummer/
+mum_path=""
+tmp_path="/tmp/mummertmp/"
+
+if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then
+ log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer."
+ exit 127
+fi
+
+# since we have more than 9 arguments we need to shift the sections or use own array
+args=("$@")
+# to keep things readible assign vars
+input_ref="${args[0]}"
+input_query="${args[1]}"
+out_delta="${args[2]}"
+out_coords="${args[3]}"
+out_png="${args[4]}"
+logfile="${args[5]}"
+algorithm="${args[6]}"
+keep_delta="${args[7]}"
+make_coords="${args[8]}"
+keep_log="${args[9]}"
+make_image="${args[10]}"
+cmd_extra="${args[11]}"
+
+[ -d $tmp_path ] || mkdir $tmp_path
+cd $tmp_path
+
+# enable/disable the STDOUT log file
+if [ "$keep_log" == "yes" ]; then
+ logfile_c="2>$logfile"
+ logfile_a="2>>$logfile"
+else
+ #dump to dev/null
+ logfile_c="2>&-"
+ logfile_a="2>&-"
+fi
+
+# extra mummer cmd line options
+
+## generate coords file on the fly?
+if [ "$make_coords" == "yes" ]; then
+ options=" --coords"
+fi
+## extra cmd line args to be concatenated in options? We need to prevent extra spaces!
+if [ "$cmd_extra" != "" ]; then
+ if [ "$options" == "" ]; then
+ options=" $cmd_extra"
+ else
+ options="$options $cmd_extra"
+ fi
+fi
+
+# run nucmer/promer
+# May only run Promer and Nucmer
+echo $algorithm
+if [[ $algorithm =~ ...mer$ ]]; then
+ eval "$mum_path$algorithm$options $input_ref $input_query $logfile_c"
+else
+ log_err 'ERROR, algorithm does not conform to ...mer'
+ exit 1
+fi
+
+
+## generate large png if option make_image = yes
+## suppress error from mummerplot since some is deprecated but not a real error
+## error can be easily avoided by modifying the source of mummerplot... just in case
+## however we need to check if a valid png was generated. This is not the case is alignment is none
+## 1 is stderr and 2 stdout. redirect to dev/null
+if [ "${make_image}" == "yes" ]; then
+ eval "$mum_path mummerplot --large --png out.delta 1>&- $logfile_a"
+ if [ -f "out.png" ]; then
+ mv out.png $out_png
+ #cleanup temp gnuplot file
+ rm out.gp
+ else
+ log_err "not exist the req png file!"
+ exit 1
+ fi
+
+ ## clean up remaining files
+ rm out.fplot
+ rm out.rplot
+
+fi
+
+# keep/rename or delete delta file
+if [ "$keep_delta" == "yes" ]; then
+ mv out.delta "$out_delta"
+else
+ rm out.delta
+fi
+
+# keep/rename coords file if it was created
+if [ "$make_coords" == "yes" ]; then
+ mv out.coords "$out_coords"
+fi
+# end script
+exit 0
\ No newline at end of file
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummer_tool.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_tool.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,114 @@
+
+ : Compare and plot genomes (Nucmer or Promer)
+
+ mummer_tool.sh
+ $input_ref $input_query
+ $out_delta $out_coords $out_png $out_log
+ $algorithm
+ $keep_delta $make_coords $keep_log $make_image
+ $cmd_extra
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ make_coords=="yes"
+
+
+ keep_delta=="yes"
+
+
+ make_image=="yes"
+
+
+ keep_log=="yes"
+
+
+
+ gnuplot
+ MUMmer
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+|
+
+
+**Reference**
+-------------
+
+- **Nucmer Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands.**
+
+- **Nucmer or Promer of MUMmer suite:** v3.22 http://mummer.sourceforge.net/manual/
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+
+If you found these tools/wrappers useful in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**Command line arguments**
+--------------------------
+
+--mum Use anchor matches that are unique in both the reference and query
+--mumreference Use anchor matches that are unique in the reference but not necessarily unique in the query (default behavior)
+--maxmatch Use all anchor matches regardless of their uniqueness
+--breaklen Distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200)
+--mincluster Minimum cluster length (default 65)
+--delta Toggle the creation of the delta file. Setting --nodelta prevents the alignment extension step and only outputs the match clusters (default --delta)
+--depend Print the dependency information and exit
+--diagfactor Maximum diagonal difference factor for clustering, i.e. diagonal difference / match separation (default 0.12)
+--extend Toggle the outward extension of alignments from their anchoring clusters. Setting --noextend will prevent alignment extensions but still align the DNA between clustered matches and create the .delta file (default --extend)
+--forward Align only the forward strands of each sequence
+--maxgap Maximum gap between two adjacent matches in a cluster (default 90)
+--help Print the help information and exit
+--minmatch Minimum length of an maximal exact match (default 20)
+--optimize Toggle alignment score optimization. Setting --nooptimize will prevent alignment score optimization and result in sometimes longer, but lower scoring alignments (default --optimize)
+--reverse Align only the reverse strand of the query sequence to the forward strand of the reference
+--simplify Simplify alignments by removing shadowed clusters. Turn this option off (--nosimplify) if aligning a sequence to itself to look for repeats (default --simplify)
+--version Print the version information and exit
+--coords **Automatically ON in galaxy wrapper!** It generates the .coords file using the 'show-coords' program with the -r option.
+--prefix **Do NOT use in Galaxy wrapper!** Set the output file prefix (default out)
+
+|
+|
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummer_utilities_tool.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummer_utilities_tool.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,184 @@
+
+ : Show and filter on sequence delta file
+
+
+ $tool.cmd
+ $cmd_extra
+ $input_delta
+ #if $tool.cmd=="show-aligns":
+ $tool.aligns1
+ $tool.aligns2
+ #end if
+ > $out_tool
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ gnuplot
+ MUMmer
+
+
+
+
+
+
+|
+
+
+**Reference**
+=============
+
+- **MUMmer_utilities Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+- **MUMmer utilities running on MUMmer delta file:** http://mummer.sourceforge.net/manual
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmer Utilities**
+====================
+
+All tools are using the MUMmer generated DELTA file! Additional arguments are only required for show-aligns.
+
+Show-coords
+-----------
+
+show-coords parses the delta alignment output of NUCmer and PROmer, and displays summary
+information such as position, percent identity and so on, of each alignment. It is the most
+commonly used tool for analyzing the delta files. *Usually the -r is used to sort lines by reference*
+
+
+Show-tiling
+-----------
+
+show-tiling attempts to construct a tiling path out of the query contigs as mapped to the reference
+sequences. Given the delta alignment information of a few long reference sequences and many small
+query contigs, show-tiling will determine the best mapped location of each query contig. Note that
+each contig may only be tiled once, so repetitive regions may cause this program some difficulty.
+This program is useful for aiding in the scaffolding and closure of an unfinished set of contigs,
+if a suitable, high similarity reference genome is available. Or, if using PROmer, show-tiling will
+help in the identification of syntenic regions and their contig's mapping to the references.
+
+This program is not suitable for "many vs. many" assembly comparisons, however a new tool based on
+the concepts of show-tiling should be available in the near future that will facilitate the mapping
+of assembly contigs.
+
+
+Show-snps
+---------
+
+show-snps is a utility program for reporting polymorphisms contained in a delta encoded alignment
+file output by NUCmer or PROmer. It catalogs all of the single nucleotide polymorphisms (SNPs) and
+insertions/deletions within the delta file alignments. Polymorphisms are reported one per line, in
+a delimited fashion similar to show-coords. Pairing this program with the appropriate MUMmer tools
+can create an easy to use SNP pipeline for the rapid identification of putative SNPs between any
+two sequence sets, as demonstrated in the manual SNP detection section.
+
+
+Show-diff
+---------
+
+Outputs a list of structural differences for each sequence in
+the reference and query, sorted by position. For a reference
+sequence R, and its matching query sequence Q, differences are
+categorized as GAP (gap between two mutually consistent alignments),
+DUP (inserted duplication), BRK (other inserted sequence), JMP
+(rearrangement), INV (rearrangement with inversion), SEQ
+(rearrangement with another sequence). The first five columns of
+the output are seq ID, feature type, feature start, feature end,
+and feature length. Additional columns are added depending on the
+feature type. Negative feature lengths indicate overlapping adjacent
+alignment blocks.
+::
+
+ IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
+ IDR DUP dup-start dup-end dup-length
+ IDR BRK gap-start gap-end gap-length
+ IDR JMP gap-start gap-end gap-length
+ IDR INV gap-start gap-end gap-length
+ IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
+
+Positions always reference the sequence with the given ID. The
+sum of the fifth column (ignoring negative values) is the total
+amount of inserted sequence. Summing the fifth column after removing
+DUP features is total unique inserted sequence. Note that unaligned
+sequence are not counted, and could represent additional "unique"
+sequences. See documentation for tips on how to interpret these
+alignment break features.
+
+
+Show-aligns
+-----------
+
+show-aligns parses the delta encoded alignment output of NUCmer and PROmer, and displays
+the pair-wise alignments from the two sequences specified on the command line. It is handy
+for identifying the exact location of errors and looking for SNPs between two sequences.
+
+
+Delta-filter
+------------
+
+delta-filter is a utility program for the manipulation of the delta encoded alignment files output
+by the NUCmer and PROmer pipelines. It takes a delta file as input and filters the information based
+on the various command line switches, outputting only the desired alignments to stdout. Options to filter by
+alignment length, identity, uniqueness and consistency are provided. Certain combinations of these
+options can greatly reduce the number of unwanted alignments in the delta file, thus making the output
+of programs such as show-coords more comprehendible.
+
+
+
+**CMD line options (specific for each tool!):**
+===============================================
+
+**Show-coords**
+
+http://mummer.sourceforge.net/manual/#coords
+
+**Show-tiling**
+
+http://mummer.sourceforge.net/manual/#tiling
+
+**Show-snps**
+
+http://mummer.sourceforge.net/manual/#snps
+
+**Show-aligns**
+
+http://mummer.sourceforge.net/manual/#aligns
+
+**Delta-filter**
+
+http://mummer.sourceforge.net/manual/#filter
+
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummerplot_tool.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummerplot_tool.sh Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+## simple bash to generate mummerplot of MATCH file
+##
+## Galaxy wrapper by Alex Bossers, CVI of Wageningen UR, Lelystad, NL
+## alex_dot_bossers_at_wur_dot_nl
+##
+##
+## needs a rename of the fixed name to something recognised by galaxy
+## needs cleanout of temp files
+##
+## call is mummerplot $format $in_match $out_file $cmd_extra
+## $0 $1 $2 $3 $4
+##
+## since mummerplot uses some deprecated syntax which can be fixed in the source
+## we redirect STDERR to dev/null to circumvent errorstatus in galaxy
+## io redirects 0=stdin 1=stdout 2=stderr to dev/null (or &-)
+
+# Function to send error messages.
+log_err() { echo "$@" 1>&2; }
+
+# path to where mummer suite is installed
+# adjust this for your machine
+# this is the only hard coded path in the scripts
+mum_path=""
+
+if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then
+ log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer."
+ exit 127
+fi
+
+# some default options to generate a LARGE fixed PNG/POSTSCRIPT image and not an interactive one.
+
+if [ "$1" = "png" ]; then
+ extension="png"
+else
+ extension="ps"
+fi
+
+eval "$mum_path mummerplot --large --$1 $2 1>&- 2>&-"
+if [ -f "out.$extension" ]; then
+ #conditional move to something known by galaxy
+ mv out.$extension $3
+ #remove gnuplot file
+ rm out.gp
+fi
+
+## clean up
+rm out.fplot
+rm out.rplot
+
+#end script
diff -r c1c38335322e -r 479eb076cd23 MUMmer/mummerplot_tool.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/mummerplot_tool.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,115 @@
+
+ : Generate MUMmerplots from MUMmer match file
+
+ mummerplot_tool.sh
+ #if $img_format=="png":
+ png $input_match $out_png
+ #else:
+ postscript $input_match $out_postscript
+ #end if
+ $cmd_extra
+
+
+
+
+
+
+
+
+
+
+
+
+
+ img_format=="png"
+
+
+ img_format=="postscript"
+
+
+
+
+ gnuplot
+ MUMmer
+
+
+
+
+
+
+|
+
+
+**Reference**
+=============
+
+- **MUMmerplot Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands**
+
+- **MUMmerplot running on MUMmer-match file:** http://mummer.sourceforge.net/manual#mummerplot
+
+- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/
+
+If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve
+or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)
+
+
+**MUMmerplot**
+==============
+
+| This plotting tool requires a MUMmer match file (either the delta file or the tiling result file)!
+| MUMmerplot requires gnuplot (www.gnuplot.info) to be installed.
+|
+| **The plotting has by default set the arguments --large and --png/--postscript to generate a fixed image instead of an interactive view!** Optional cmd line arguments can be used.
+|
+
+
+
+Mummerplot is a script utility that takes output from *MUMmer, nucmer or promer* as DELTA file, or the
+*show-tiling* result file, and converts it to a format suitable for plotting with gnuplot. The primary
+plot type is an alignment dotplot where a sequence is laid out on each axis and a point is plotted at
+every position where the two sequences show similarity. As an extension to this plot style, mummerplot
+is also able to offset multiple 1-vs-1 dotplots to form a multiplot where multiple sequences can be
+laid out on each axis. This plot style is especially handy for browsing an alignment of two contig
+sets. Identity plots are also possible by coloring each data point with a color gradient representing
+identity, or by collapsing the y-axis data onto a single line and then vertically offsetting the
+data points by their identities. In addition to producing the plot data, mummerplot also generates a
+gnuplot script that will be evaluated in order to generate the graph.
+
+
+The *match file* can either be a three column match list from mummer (either 3 or 4 column format),
+the delta file from nucmer or promer, or the default output from show-tiling. mummerplot will
+automatically detect the type of input file it is given, regardless of its file extension, or it
+will fail if the input file is of an unrecognized type.
+
+
+
+Optional command line arguments
+-------------------------------
+
+--breaklen Highlight alignments with a breakpoint further than the given distance from the nearest sequence end
+--nocolor Color plot lines with a percent similarity gradient or turn off all color (default color by match direction)
+--coverage Generate a reference coverage plot, also known as a percent identity plot (default behavior for show-tiling input)
+--depend Print dependency information and exit
+--filter Only display alignments which represent the "best" one-to-one mapping of reference and query subsequences (requires delta formatted input)
+--help Print help information and exit
+--layout Layout a multiplot by ordering and orienting sequences such that the largest hits cluster near the main diagonal (requires delta formatted input)
+--prefix *do not use in galaxy!* Set the output file prefix (default 'out')
+--rv Reverse video, swap the foreground and background colors for x11 plots (requires x11 terminal)
+--IdR Select a specific reference sequence for the x-axis
+--IdQ Select a specific query sequence for the y-axis
+--Rfile Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired reference sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-]
+--Qfile Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired query sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-]
+--size Set the output size to small, medium or large
+--large **default enabled to generate highres image**. Other sizes no effect: --small --medium --large
+--SNP Highlight SNP locations in the alignment
+--terminal *do not use in galaxy* Set the output terminal to x11, postscript or png
+--png **either png or postscript for fixed image**. Other interactive x11 not enabled
+--postscript Alternate output format instead of png.
+--xrange Set the x-range for the plot in the form "[min,max]"
+--yrange Set the y-range for the plot in the form "[min,max]"
+--version Display version information and exit
+
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/nucmer_coords2ACT_galaxy.pl
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/nucmer_coords2ACT_galaxy.pl Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,42 @@
+#!/usr/bin/perl
+
+# converts the MUMmer-nucmer coords file in a file readable for Artemis Comparison Tool
+# Output format is like crunch of BLAST
+#
+# [nov 2010] Galaxy wrapped up version
+#
+# Alex.Bossers@wur.nl
+
+
+use warnings;
+use strict;
+
+#$filename=shift;
+ #$ARGV[0] =~ m/^([A-Z0-9_.-]+)$/ig;
+my $filename = $ARGV[0];
+ #$ARGV[1] =~ m/^([A-Z0-9_.-]+)$/ig;
+my $fileout = $ARGV[1];
+#my $filename = "Curated_vs_noncurated_8067_01.nucmer.coords";
+#my $fileout = "Curated_vs_noncurated_8067_01.nucmer.tab";
+
+open (COORDS,$filename) || die "error opening input coords file";
+open (OUT,">$fileout") || die "error opening tab output file";
+
+while ()
+ {
+ unless ($_ =~ /^(\s*)\d/){next}
+ $_ =~ s/\|//g;
+
+ my @f = split;
+ # create crude match score = ((length_of_match * %identity)-(length_of_match * (100 - %identity))) /20
+ my $crude_plus_score=($f[4]*$f[6]);
+ my $crude_minus_score=($f[4]*(100-$f[6]));
+ my $crude_score= int(($crude_plus_score - $crude_minus_score) / 20);
+ # reorganise columns and print crunch format to stdout
+ # score %id S1 E1 seq1 S2 E2 seq2 (description)
+ print OUT " $crude_score $f[6] $f[0] $f[1] $f[7] $f[2] $f[3] $f[8] nucmer comparison coordinates\n"
+ }
+
+close (COORDS);
+close (OUT);
+print "Done!\n\n";
diff -r c1c38335322e -r 479eb076cd23 MUMmer/nucmer_coords2ACT_galaxy.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/nucmer_coords2ACT_galaxy.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,39 @@
+
+ : convert MUMmer comparison (coords) file to ACT (Artemis)
+
+ nucmer_coords2ACT_galaxy.pl $in_coords $out_act
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+|
+|
+
+**Info**
+--------
+
+This tool will convert the MUMmer comparison file (run MUMmer with the coords option) into a "blast crunch" file
+that can be read as a comparison file in Artemic Comparison Tool (ACT).
+
+It will output a single tabular crunch file (save as extension .tab on windows systems).
+
+**Reference/questions/remarks**
+
+- *Conversion perl script and wrapper:* Alex Bossers, CVI of Wageningen UR, The Netherlands.
+
+
+
+
+
+
diff -r c1c38335322e -r 479eb076cd23 MUMmer/suite_config.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MUMmer/suite_config.xml Tue Oct 28 16:59:33 2014 +0100
@@ -0,0 +1,22 @@
+
+ This suite contains MUMmer genome alignment tools and parsers
+
+ : Compare genomes by alignment (Nucmer or Promer)
+
+
+ : Maximal exact sequence matching
+
+
+ : order sequence matches in clusters
+
+
+ : Show and filter on sequence delta file
+
+
+ : Generate MUMmerplots from MUMmer match file
+
+
+ : convert MUMmer comparison (coords) file to ACT (Artemis)
+
+
+