Mercurial > repos > abossers > mummer_toolsuite
changeset 2:479eb076cd23
Add revised mummer toolshed files to testtoolshed
author | abossers |
---|---|
date | Tue, 28 Oct 2014 16:59:33 +0100 |
parents | c1c38335322e |
children | f807110e7c80 |
files | MUMmer/README_mummer MUMmer/mummer_clustering.xml MUMmer/mummer_maxmatch.xml MUMmer/mummer_tool.sh MUMmer/mummer_tool.xml MUMmer/mummer_utilities_tool.xml MUMmer/mummerplot_tool.sh MUMmer/mummerplot_tool.xml MUMmer/nucmer_coords2ACT_galaxy.pl MUMmer/nucmer_coords2ACT_galaxy.xml MUMmer/suite_config.xml |
diffstat | 11 files changed, 1160 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/README_mummer Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,56 @@ +# Created/shared May 2011 +# +# Alex Bossers +# Central Veterinary Institute +# Wageningen University and Research centre +# Lelystad, The Netherlands +# +# Comments/improvements/bugs: Alex (dot) Bossers (at) wur (dot) nl + + +# WHAT IT DOES +The MUMmer suite is a set of very basic wrappers for the MUMmer genome comparison tools. Most common operations should be possible +by using these wrappers. MUMmer works fast on smaller (bacterial) genomes but can also cope with eukaryotic genomes. + +In addition to the original MUMmer tools it also contains an additional conversion script to convert MUMmer comparison files, +the so-called coords files into a readible format for Artemis Comparison Tool (ACT; Sanger UK). + + +# REQUIREMENTS +- Perl +- Galaxy :) +- MUMmer newer than version 3.20; + even though older versions might work as well. + Get your MUMmer here: http://mummer.sourceforge.net/ + Make sure MUMmer is in your PATH and/or update the tool xml configs and wrappers for the full MUMmer path + if it is different from /opt/MUMmer/MUMmer. +- ACT can be run locally or via Webstart if you want to visualise genome comparisons in detail: http://www.sanger.ac.uk/resources/software/act +- GNUplot is a requirement for the MUMmerplot part (see MUMmer installation documentation) + + +# SETUP +Just unpack the tool xml and perl script somewhere appropriate and adapt the MUMmer installation part if different from above. Plug the tool in the tool_config.xml +of your galaxy instance and refresh the tools or restart the galaxy server. + + +# TESTING +You can test the code by running Nucmer on the test data and visualise the results in MUMmerplot. +It should return a MUMmerplot identical to the image provided. For reference I also included the corresponding log file. + + +# LICENSE +Copyright (c) 2011 Central Veterinary Institute of Wageningen UR, Lelystad, The Netherlands. +MUMmer is copyright by its respective owner. See their licensing details. + +Our wrappers/programs are free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3 of the License, or +(at your option) any later version. + +When distributing the tools please include this original reference. + +Use this tool at your own risk. Even though we tried to build tools and wrappers that free of errors, +check your output since it might be erroneous. We will not be relyable to any failure this may have caused. + +If you like these scripts, please acknowledge our work. +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummer_clustering.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,238 @@ +<tool id="mummer_clustering" name="MUMmer Clustering" version="0.9.alx" force_history_refresh="True"> + <description>: order sequence matches in clusters</description> + <command> + <!-- update this path to the installed location --> + $tool.cmd + #if $tool.cmd=="gaps": + $in_reference + #if $tool.gaps_r=="yes": + -r + #end if + #end if + #if $tool.cmd=="mgaps": + #if $tool.cmd_C=="yes": + -C + #end if + -d $tool.cmd_d + #if $tool.cmd_e=="yes": + -e + #end if + -f $tool.cmd_f + -l $tool.cmd_l + -s $tool.cmd_s + #end if + < $tool.in_match_list + > $out_tool + + </command> + <inputs> + <conditional name="tool"> + <param name="cmd" type="select" label="MUMmer maximal matching" help="Algorithms are run with default parameters (none). For specific args see help below" > + <option value="gaps" selected="true">gaps</option> + <option value="mgaps">mgaps</option> + </param> + <when value="gaps"> + <param name="in_reference" type="data" format="fasta" label="Reference FastA file" /> + <param name="gaps_r" type="select" label="Use reversed [-r]" > + <option value="no" selected="true">No</option> + <option value="yes">Yes</option> + </param> + <param name="in_match_list" type="data" format="text" label="MUMmer match list" help="See help for more details" /> + </when> + <when value="mgaps"> + <param name="in_match_list" type="data" format="text" label="MUMmer match list" help="See help for more details" /> + <param name="cmd_C" type="select" label="Check input header labels have reversed keyword [-C]" > + <option value="no" selected="true">No</option> + <option value="yes">Yes</option> + </param> + <param name="cmd_d" type="integer" size="5" value="5" label="Max fixed diagonal difference [-d]" /> + <param name="cmd_e" type="select" label="Use extent of cluster [-e]" > + <option value="no" selected="true">No</option> + <option value="yes">Yes</option> + </param> + <param name="cmd_f" type="float" size="5" value="0.05" label="Max fraction separation for diagonal difference [-f]" /> + <param name="cmd_l" type="integer" size="5" value="200" label="Min cluster length [-l]" /> + <param name="cmd_s" type="integer" size="5" value="1000" label="Max separation adjecent matches in cluster [-s]" /> + </when> + </conditional> + </inputs> + <outputs> + <data name="out_tool" format="text" label="Clustering output" /> + </outputs> + <requirements> +<!-- <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> --> + <requirement type="package" version="4.6.4">gnuplot</requirement> + <requirement type="package" version="3.23">MUMmer</requirement> + </requirements> + <tests> + <test> + </test> + </tests> + <help> +| + + +**Reference** +============= + +- **MUMmer clustering Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands. + +- **MUMmer suite v3.22:** http://mummer.sourceforge.net + +- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/ + +If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve +or modify the wrappers please add instead of substitute yourself into the acknowlegement section :) + + +**MUMmer Clustering** +===================== + +MUMmer's clustering algorithms attempt to order small individual matches into larger match clusters +in order to make the output of mummer more intelligible. A dot plot makes it easy to spot alignment +regions from a match list, however when examining the data without graphic aids, it is very difficult +to draw any reasonable conclusions from the simple flat file list of matches. Clustering the matches +together into larger groups of neighboring matches makes this process much easier by ordering the +data and removing spurious matches. + + +Gaps +---- + +*gaps* is the primary clustering algorithm for run-mummer1, and although classified as a "clustering" +step, gaps is more of a sorting routine. It implements the LIS (longest increasing subset) algorithm +to extract the longest consistent set of matches between two sequences, and generates a single +cluster that represents the best "straight-line" arrangement of matches between the sequences. By +straight-line, we mean no rearrangements or inversions, just a simple path of agreeing matches +between the two sequences. This limits the usability of this program to the alignment of genomes +that are very similar and with no large scale mutations. *gaps* is best suited for the comparison of +near identical sequences with the goal of finding minor mutations like SNPs and small indels. + +Input can be filtered mummer output. The strange syntax is a result of a legacy issue described in +the Known problems (manual) section, and requires the header be stripped from the mummer output. In +addition, gaps is only designed to handle a single reference and a single query sequence, thus the +preceding mummer run must also follow this constraint. The -r is optional and designates the incoming +matches as reverse complement matches which must reference the reverse complement of the sequence, +therefore forcing mummer to be run without the -c option. + +Reference: http://mummer.sourceforge.net/manual/#gaps + +**Output:** +:: + + > /home/aphillip/data/GHP.1con Consistent matches + 183 17 22 none - - + 238 72 108 none 33 33 + 347 181 92 none 1 1 + 458 292 50 none 19 19 + 705 539 44 none 1 1 + 750 584 38 none 1 1 + 807 641 23 -16 0 4 + (output continues ...) + > Wrap around + 334398 329917 47 none - 225 + 334446 329965 62 none 1 1 + 334539 330058 20 none 31 31 + 334560 330079 92 none 1 1 + 334653 330172 77 none 1 1 + 334740 330259 41 none 10 10 + (output continues ...) + > /home/aphillip/data/GHP.1con Other matches + 1317231 4891 21 none - - + 1317275 4927 21 none - - + 1317804 5399 25 none 508 451 + 947580 5436 36 none - - + 23406 5518 34 none - - + 333079 6592 32 none - - + (output continues ...) + +Where the first line is the location of the reference file, and the first three columns are the same +as the three column match format described in the mummer section. The final three columns are the +overlap between this match and the previous match, the gap between the start of this match and the +end of the previous match in the reference, and the gap between the start of this match and the end +of the previous match in the query respectively. + + +mgaps +----- + +*mgaps* was introduced into the MUMmer pipeline in an effort to better handle large-scale +rearrangements and duplications. Unlike gaps, mgaps is a full clustering algorithm that is capable +of generating multiple groups of consistently ordered matches. Clustering is controlled by a set of +command-line parameters that adjust the minimum cluster size, maximum gap between matches, etc. Only +matches that were included in clusters will appear in the output, so by adjusting the command-line +parameters it is possible to filter out many of the spurious matches, thus leaving only the larger +areas of conservation between the input sequences. The major advantage of mgaps is its ability to +identify these "islands" of conservation. This frees the user from the single LIS restraints of the +gaps program and allows for the identification of large-scale rearrangements, duplications, gene +families and so on. + +Gaps can fail to identify clusters because they were not consistent with the LIS. However, by using +mgaps, all regions of conservation can now been identified. The only fallback being the increased +complexity of the output, where you once had only one cluster for the whole comparison, you usually +now get more. Because of this, it can sometimes be difficult separating the repetitive clusters from +"correct" clusters, *making mgaps more suited for global alignments instead of localized error detection*. + +Input can be raw mummer output. *mgaps* is only designed to handle a single reference and one or +more query sequences, thus the preceding mummer run must also follow this constraint. Please refer +to the run-mummer3 script (see online manual) for an example of how to use this program in an +alignment pipeline. Note that in order to cluster reverse complement matches, the reverse complement +matches must reference the reverse complement strand of the query sequence, therefore forcing mummer +to be run without the -c option. A rewrite of this algorithm to handle multiple reference sequences +and a better coordinate system (forward coordinates for reverse complement matches) is doubtful but +may eventually appear. + +The -d option can be interpreted as the number of insertions allowed between two matches in the same +cluster, while the -f option is a fraction equal to (diagonal difference / match separation) where +a higher value will increase the indel tolerance. Minimum cluster length is the sum of the contained +matches unless the -e option is used. The best way to get a feel for what each parameter controls +is to cluster the same data set numerous times with different values and observe the resulting +differences. It can also be helpful to set these parameters to the size of the element you wish to +capture, i.e. set the minimum cluster size to say the smallest exon you expect and set the max gap +to the smallest intron you expect to obtain clusters that could represent single exons (depending +of course of the similarity of the two sequences). + +Reference: http://mummer.sourceforge.net/manual/#mgaps + +**Output format** + +Output of *mgaps* shares much in common with the output of mummer and gaps, with a slightly different +header formatting than gaps to allow for multiple query sequences and multiple clusters. The output +of mgaps run on both forward and reverse complement matches is as follows: +:: + + > ID41 + > ID41 Reverse + 5177399 1 232 none - - + 5177632 234 6794 none 1 1 + 5184433 7035 24 none 7 7 + 5184468 7069 23 none 11 10 + > ID42 + 10181 43 1521 none - - + > ID42 Reverse + 4654536 17 36 none - - + 4654578 57 298 none 6 4 + 4654877 356 226 none 1 1 + # + 4655139 845 28 none - - + 4655178 884 694 none 11 11 + 4655873 1579 20 none 1 1 + # + 4850044 17 1492 none - - + 4851537 1510 711 none 1 1 + 4852249 2222 42 none 1 1 + (output continues ...) + + +Headers containing the ID for each query sequence are listed after the '>' characters, and a +following Reverse keyword identifies the reverse matches for that query sequence. Individual clusters +for each sequence are separated by a '#' character, and the six columns are exactly the same as the +gaps output (see the gaps section for more details). + + +| +| + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummer_maxmatch.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,170 @@ +<tool id="mummer_maxmatch" name="MUMmer MaxMatch" version="0.9.alx" force_history_refresh="True"> + <description>: Maximal exact sequence matching</description> + <command> + <!-- update this path to the installed location --> + $tool.cmd + #if $tool.cmd=="mummer": + $tool.cmd_extra + $tool.mum_ref_in + $tool.mum_q_in + #end if + #if $tool.cmd=="repeat-match": + -n $tool.rm_n + #if $tool.rm_E=="yes": + -E + #end if + $tool.cmd_extra + $tool.in_seq + #end if + #if $tool.cmd=="exact-tandems": + $tool.in_seq + $tool.et_minl + #end if + <!-- unfortunate somehow error state gets set also on succesfull jobs. Pipe io stderr to dev/null --> + 2>&- + > $out_tool + + </command> + <inputs> + <conditional name="tool"> + <param name="cmd" type="select" value="mummer" label="MUMmer maximal matching" help="Algorithms are run with default parameters (none). For specific args see help below" > + <option value="mummer">mummer</option> + <option value="repeat-match">repeat-match</option> + <option value="exact-tandems">exact-tandems</option> + </param> + <when value="mummer"> + <param name="mum_ref_in" type="data" format="fasta" label="Reference FastA file" /> + <param name="mum_q_in" type="data" format="fasta" label="Query (multi) FastA sequence" /> + <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="See specific cmd line options below for each tool" /> + </when> + <when value="repeat-match"> + <param name="in_seq" type="data" format="fasta" label="FastA sequence file" /> + <param name="rm_n" type="text" size="5" value="20" label="Minimum exact match length [-n]" /> + <param name="rm_E" type="select" value="no" label="Use exhaustive (slow) search to find matches [-E]" > + <option value="no">No</option> + <option value="yes">Yes</option> + </param> + <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="-n and -E are configured above. More specific cmd line options in help below." /> + </when> + <when value="exact-tandems"> + <param name="in_seq" type="data" format="fasta" label="FastA sequence file" /> + <param name="et_minl" type="text" size="5" value="20" label="Minimum length" /> + </when> + </conditional> + </inputs> + <outputs> + <data name="out_tool" format="text" label="Max exact match output" /> + </outputs> + <requirements> +<!-- <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> --> + <requirement type="package" version="4.6.4">gnuplot</requirement> + <requirement type="package" version="3.23">MUMmer</requirement> + </requirements> + <tests> + <test> + </test> + </tests> + <help> +| + + +**Reference** +============= + +- **MUMmer MaxExactMatch Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands. + +- **MUMmer suite v3.22:** http://mummer.sourceforge.net + +- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/ + +Please do not use any of the command line options that modify prefixes or file names. As obvious +they are quite useless within galaxy and are likely to fail the routine! + +If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve +or modify the wrappers please add instead of substitute yourself into the acknowlegement section :) + + + +**MUMmer Maximal exact matching** +================================= + +The heart of the MUMmer package is its suffix tree based maximal matching routines. These can be +used for repeat detection within a single sequence as is done by *repeat-match* and *exact-tandems*, +or can be used for the alignment of two or more sequences as is done by *mummer*. + +Mummer +------ + +mummer is a suffix tree algorithm designed to find maximal exact matches of some minimum length +between two input sequences. by default mummer will only find maximal matches that are unique in +the entire set of reference sequences. The match lists produced by mummer can be used alone to +generate alignment dot plots, or can be passed on to the clustering algorithms for the identification +of longer non-exact regions of conservation. These match lists have great versatility because they +contain huge amounts of information and can be passed forward to other interpretation programs for +clustering, analysis, searching, etc. + + +Repeat-match +------------ + +repeat-match is a suffix tree algorithm designed to find maximal exact repeats within a single input +sequence. It uses a similar algorithm to mummer, but altered slightly to find maximal exact matches +within a single sequence. + +Output formatting varies depending on the command line parameters and the output can be quite large. +The standard output format that results from running repeat-match with default parameters is as follows: +:: + + Long Exact Matches: + Start1 Start2 Length + 4919485 4919506r 22 + +The three columns are the first position of the repeat, the second position of the repeat, and the +length of the repeat respectively. Reverse complement repeat positions are denoted by an 'r' +following the Start2 position, and are relative to the forward strand of the sequence. + + +Exact-tandems +------------- + +exact-tandems is a wrapper script for the repeat-match program. It provides a list of exact tandem +repeats within a single input sequence. As with repeat-match the sequence file should contain only +one sequence in FastA format, however if multiple sequences exist the first one will be used. The +sequence may contain any set of upper and lowercase characters, thus DNA and protein sequence are +both allowed and matching is case insensitive. The minimum match length parameter should be a +positive integer, this value will be passed to the repeat-match program via the -n option. + +The output format of exact-tandems is as follows: +:: + + Finding matches + Tandem repeats + Start Extent UnitLen Copies + 416173 150 45 3.3 + +The four columns are the first position of the tandem, the extent of the repeat region, the length +of each tandem repeat unit, and the number of repeat units respectively. + + + +**Manuals and CMD line options (specific for each tool!):** +=========================================================== + +**Mummer** + +http://mummer.sourceforge.net/manual/#mummer + +**Repeat-match** + +http://mummer.sourceforge.net/manual/#repeat + +**exact-tandems** + +http://mummer.sourceforge.net/manual/#exact + +| +| + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummer_tool.sh Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,128 @@ +#!/bin/bash +## use #!/bin/bash -x for debugging + +## Galaxy wrapper for MUMmer (nucmer/promer) +## Alex Bossers, CVI of Wageningen UR, NL +## alex_dot_bossers_at_wur_dot_nl +## +## Sep 2010 +## +## Wrapper runs MUMmer nucmer/promer and additional args +## Calculates the comparison scores (delta and optional coords file) +## Generates the optional STATIC comparison mummerplot to png (from delta file) +## +## finally the script renames (optional) output files to outfiles expected by Galaxy +## +## +## INPUT args: +## nucmer_tool.sh $input_ref $input_query $out_delta $out_coords $out_png $logfile +## @0 @1 @2 @3 @4 @5 +## $algorithm $keep_delta $make_coords $keep_log $make_image $cmd_extra +## @6 @7 @8 @9 @10 @11 +## + +# Function to send error messages. +log_err() { echo "$@" 1>&2; } +# path to where mummer suite is installed +# adjust this for your machine +# If mummer is available in system path, leave empty +# when using different path make sure the trailing slash is added. +# mum_path = /opt/Mummer23/Mummer/ +mum_path="" +tmp_path="/tmp/mummertmp/" + +if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then + log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer." + exit 127 +fi + +# since we have more than 9 arguments we need to shift the sections or use own array +args=("$@") +# to keep things readible assign vars +input_ref="${args[0]}" +input_query="${args[1]}" +out_delta="${args[2]}" +out_coords="${args[3]}" +out_png="${args[4]}" +logfile="${args[5]}" +algorithm="${args[6]}" +keep_delta="${args[7]}" +make_coords="${args[8]}" +keep_log="${args[9]}" +make_image="${args[10]}" +cmd_extra="${args[11]}" + +[ -d $tmp_path ] || mkdir $tmp_path +cd $tmp_path + +# enable/disable the STDOUT log file +if [ "$keep_log" == "yes" ]; then + logfile_c="2>$logfile" + logfile_a="2>>$logfile" +else + #dump to dev/null + logfile_c="2>&-" + logfile_a="2>&-" +fi + +# extra mummer cmd line options + +## generate coords file on the fly? +if [ "$make_coords" == "yes" ]; then + options=" --coords" +fi +## extra cmd line args to be concatenated in options? We need to prevent extra spaces! +if [ "$cmd_extra" != "" ]; then + if [ "$options" == "" ]; then + options=" $cmd_extra" + else + options="$options $cmd_extra" + fi +fi + +# run nucmer/promer +# May only run Promer and Nucmer +echo $algorithm +if [[ $algorithm =~ ...mer$ ]]; then + eval "$mum_path$algorithm$options $input_ref $input_query $logfile_c" +else + log_err 'ERROR, algorithm does not conform to ...mer' + exit 1 +fi + + +## generate large png if option make_image = yes +## suppress error from mummerplot since some is deprecated but not a real error +## error can be easily avoided by modifying the source of mummerplot... just in case +## however we need to check if a valid png was generated. This is not the case is alignment is none +## 1 is stderr and 2 stdout. redirect to dev/null +if [ "${make_image}" == "yes" ]; then + eval "$mum_path mummerplot --large --png out.delta 1>&- $logfile_a" + if [ -f "out.png" ]; then + mv out.png $out_png + #cleanup temp gnuplot file + rm out.gp + else + log_err "not exist the req png file!" + exit 1 + fi + + ## clean up remaining files + rm out.fplot + rm out.rplot + +fi + +# keep/rename or delete delta file +if [ "$keep_delta" == "yes" ]; then + mv out.delta "$out_delta" +else + rm out.delta +fi + +# keep/rename coords file if it was created +if [ "$make_coords" == "yes" ]; then + mv out.coords "$out_coords" +fi +# end script +exit 0 \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummer_tool.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,114 @@ +<tool id="mummer_tool" name="MUMmer compare and plot" version="0.4.alx" force_history_refresh="True"> + <description>: Compare and plot genomes (Nucmer or Promer)</description> + <command interpreter="bash"> + mummer_tool.sh + $input_ref $input_query + $out_delta $out_coords $out_png $out_log + $algorithm + $keep_delta $make_coords $keep_log $make_image + $cmd_extra + </command> + <inputs> + <param name="algorithm" type="select" format="text" value="nucmer" label="Algorithm" help="Nucmer dna or Promer protein (FASTA: protein. Dna is six frame translated)"> + <option value="nucmer">Nucmer DNA</option> + <option value="promer">Promer</option> + </param> + <param name="input_ref" type="data" format="fasta" label="Reference sequence" /> + <param name="input_query" type="data" format="fasta" label="Sequence query file"/> + <param name="make_image" type="select" format="text" value="yes" label="Generate MUMmerplot" help="MUMmerplot will be run with default settings and --large --png as fixed image."> + <option value="yes">Yes</option> + <option value="no">No</option> + </param> + <param name="keep_delta" type="select" format="text" value="no" label="Keep delta file" help="i.e. for further processing"> + <option value="no">No</option> + <option value="yes">Yes</option> + </param> + <param name="make_coords" type="select" format="text" value="yes" label="Make coords file" help="Uses the -r argument to sort lines by reference."> + <option value="no">No</option> + <option value="yes">Yes</option> + </param> + <param name="keep_log" type="select" format="text" value="no" label="Keep console log file" help="i.e. for debugging"> + <option value="no">No</option> + <option value="yes">Yes</option> + </param> + <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="the --coords is run by default" /> + </inputs> + <outputs> + <data name="out_coords" format="tabular" label="${algorithm.value_label} coords"> + <filter>make_coords=="yes"</filter> + </data> + <data name="out_delta" format="tabular" label="${algorithm.value_label} delta"> + <filter>keep_delta=="yes"</filter> + </data> + <data name="out_png" format="png" label="${algorithm.value_label} mummerplot"> + <filter>make_image=="yes"</filter> + </data> + <data name="out_log" format="tabular" label="Console log file"> + <filter>keep_log=="yes"</filter> + </data> + </outputs> + <requirements> + <requirement type="package" version="4.6">gnuplot</requirement> + <requirement type="package" version="3.23">MUMmer</requirement> + </requirements> + <tests> + <test> + <param name="algorithm" value="nucmer" /> + <param name="input_ref" value="test.seq1.fasta"/> + <param name="input_query" value="test.seq2.fasta"/> + <param name="make_image" value="yes"/> + <param name="keep_delta" value="no" /> + <param name="make_coords" value="no" /> + <param name="keep_log" value="yes" /> + + <output name="out_log" file="test.MUMmerplot.result.Nucmer_galaxy.log" /> + <output name="out_png" file="test.MUMmerplot.result.Nucmer_galaxy.png" /> + </test> + </tests> + <help> +| + + +**Reference** +------------- + +- **Nucmer Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands.** + +- **Nucmer or Promer of MUMmer suite:** v3.22 http://mummer.sourceforge.net/manual/ + +- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/ + + +If you found these tools/wrappers useful in your research, please acknowledge our work. If you improve +or modify the wrappers please add instead of substitute yourself into the acknowlegement section :) + + +**Command line arguments** +-------------------------- + +--mum Use anchor matches that are unique in both the reference and query +--mumreference Use anchor matches that are unique in the reference but not necessarily unique in the query (default behavior) +--maxmatch Use all anchor matches regardless of their uniqueness +--breaklen Distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200) +--mincluster Minimum cluster length (default 65) +--delta Toggle the creation of the delta file. Setting --nodelta prevents the alignment extension step and only outputs the match clusters (default --delta) +--depend Print the dependency information and exit +--diagfactor Maximum diagonal difference factor for clustering, i.e. diagonal difference / match separation (default 0.12) +--extend Toggle the outward extension of alignments from their anchoring clusters. Setting --noextend will prevent alignment extensions but still align the DNA between clustered matches and create the .delta file (default --extend) +--forward Align only the forward strands of each sequence +--maxgap Maximum gap between two adjacent matches in a cluster (default 90) +--help Print the help information and exit +--minmatch Minimum length of an maximal exact match (default 20) +--optimize Toggle alignment score optimization. Setting --nooptimize will prevent alignment score optimization and result in sometimes longer, but lower scoring alignments (default --optimize) +--reverse Align only the reverse strand of the query sequence to the forward strand of the reference +--simplify Simplify alignments by removing shadowed clusters. Turn this option off (--nosimplify) if aligning a sequence to itself to look for repeats (default --simplify) +--version Print the version information and exit +--coords **Automatically ON in galaxy wrapper!** It generates the .coords file using the 'show-coords' program with the -r option. +--prefix **Do NOT use in Galaxy wrapper!** Set the output file prefix (default out) + +| +| + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummer_utilities_tool.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,184 @@ +<tool id="mummer_utilities_tool" name="MUMmer utilities" version="0.9.alx" force_history_refresh="True"> + <description>: Show and filter on sequence delta file</description> + <command> + <!-- update this path to the installed location --> + $tool.cmd + $cmd_extra + $input_delta + #if $tool.cmd=="show-aligns": + $tool.aligns1 + $tool.aligns2 + #end if + > $out_tool + </command> + <inputs> + <conditional name="tool"> + <param name="cmd" type="select" value="show-snps" label="MUMmer utility" help="Utilities are run with default parameters (none). For utility specific args see help below" > + <option value="show-snps">show SNPs</option> + <option value="show-tiling">show tiling</option> + <option value="show-diff">show diff</option> + <option value="show-coords">show coords</option> + <option value="show-aligns">show aligns</option> + <option value="delta-filter">delta filter</option> + </param> + <when value="show-aligns"> + <param name="aligns1" type="text" size="40" value="" label="IdR" help="the FastA header tag of the desired reference sequence" /> + <param name="aligns2" type="text" size="40" value="" label="IdQ" help="the FastA header tag of the desired query sequence" /> + </when> + <when value="show-snps" /> + <when value="show-tiling" /> + <when value="show-coords" /> + <when value="show-diff" /> + <when value="delta-filter" /> + </conditional> + <param name="input_delta" type="data" format="tabular" label="MUMmer delta file" /> + <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="see specific cmd line options below for each tool" /> + </inputs> + <outputs> + <data name="out_tool" format="text" /> + </outputs> + <requirements> +<!-- <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> --> + <requirement type="package" version="4.6.4">gnuplot</requirement> + <requirement type="package" version="3.23">MUMmer</requirement> + </requirements> + <tests> + <test> + </test> + </tests> + <help> +| + + +**Reference** +============= + +- **MUMmer_utilities Galaxy tool wrapper:** Alex Bossers, CVI of Wageningen UR, The Netherlands. + +- **MUMmer utilities running on MUMmer delta file:** http://mummer.sourceforge.net/manual + +- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/ + +If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve +or modify the wrappers please add instead of substitute yourself into the acknowlegement section :) + + +**MUMmer Utilities** +==================== + +All tools are using the MUMmer generated DELTA file! Additional arguments are only required for show-aligns. + +Show-coords +----------- + +show-coords parses the delta alignment output of NUCmer and PROmer, and displays summary +information such as position, percent identity and so on, of each alignment. It is the most +commonly used tool for analyzing the delta files. *Usually the -r is used to sort lines by reference* + + +Show-tiling +----------- + +show-tiling attempts to construct a tiling path out of the query contigs as mapped to the reference +sequences. Given the delta alignment information of a few long reference sequences and many small +query contigs, show-tiling will determine the best mapped location of each query contig. Note that +each contig may only be tiled once, so repetitive regions may cause this program some difficulty. +This program is useful for aiding in the scaffolding and closure of an unfinished set of contigs, +if a suitable, high similarity reference genome is available. Or, if using PROmer, show-tiling will +help in the identification of syntenic regions and their contig's mapping to the references. + +This program is not suitable for "many vs. many" assembly comparisons, however a new tool based on +the concepts of show-tiling should be available in the near future that will facilitate the mapping +of assembly contigs. + + +Show-snps +--------- + +show-snps is a utility program for reporting polymorphisms contained in a delta encoded alignment +file output by NUCmer or PROmer. It catalogs all of the single nucleotide polymorphisms (SNPs) and +insertions/deletions within the delta file alignments. Polymorphisms are reported one per line, in +a delimited fashion similar to show-coords. Pairing this program with the appropriate MUMmer tools +can create an easy to use SNP pipeline for the rapid identification of putative SNPs between any +two sequence sets, as demonstrated in the manual SNP detection section. + + +Show-diff +--------- + +Outputs a list of structural differences for each sequence in +the reference and query, sorted by position. For a reference +sequence R, and its matching query sequence Q, differences are +categorized as GAP (gap between two mutually consistent alignments), +DUP (inserted duplication), BRK (other inserted sequence), JMP +(rearrangement), INV (rearrangement with inversion), SEQ +(rearrangement with another sequence). The first five columns of +the output are seq ID, feature type, feature start, feature end, +and feature length. Additional columns are added depending on the +feature type. Negative feature lengths indicate overlapping adjacent +alignment blocks. +:: + + IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff + IDR DUP dup-start dup-end dup-length + IDR BRK gap-start gap-end gap-length + IDR JMP gap-start gap-end gap-length + IDR INV gap-start gap-end gap-length + IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence + +Positions always reference the sequence with the given ID. The +sum of the fifth column (ignoring negative values) is the total +amount of inserted sequence. Summing the fifth column after removing +DUP features is total unique inserted sequence. Note that unaligned +sequence are not counted, and could represent additional "unique" +sequences. See documentation for tips on how to interpret these +alignment break features. + + +Show-aligns +----------- + +show-aligns parses the delta encoded alignment output of NUCmer and PROmer, and displays +the pair-wise alignments from the two sequences specified on the command line. It is handy +for identifying the exact location of errors and looking for SNPs between two sequences. + + +Delta-filter +------------ + +delta-filter is a utility program for the manipulation of the delta encoded alignment files output +by the NUCmer and PROmer pipelines. It takes a delta file as input and filters the information based +on the various command line switches, outputting only the desired alignments to stdout. Options to filter by +alignment length, identity, uniqueness and consistency are provided. Certain combinations of these +options can greatly reduce the number of unwanted alignments in the delta file, thus making the output +of programs such as show-coords more comprehendible. + + + +**CMD line options (specific for each tool!):** +=============================================== + +**Show-coords** + +http://mummer.sourceforge.net/manual/#coords + +**Show-tiling** + +http://mummer.sourceforge.net/manual/#tiling + +**Show-snps** + +http://mummer.sourceforge.net/manual/#snps + +**Show-aligns** + +http://mummer.sourceforge.net/manual/#aligns + +**Delta-filter** + +http://mummer.sourceforge.net/manual/#filter + + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummerplot_tool.sh Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,52 @@ +#!/bin/bash + +## simple bash to generate mummerplot of MATCH file +## +## Galaxy wrapper by Alex Bossers, CVI of Wageningen UR, Lelystad, NL +## alex_dot_bossers_at_wur_dot_nl +## +## +## needs a rename of the fixed name to something recognised by galaxy +## needs cleanout of temp files +## +## call is mummerplot $format $in_match $out_file $cmd_extra +## $0 $1 $2 $3 $4 +## +## since mummerplot uses some deprecated syntax which can be fixed in the source +## we redirect STDERR to dev/null to circumvent errorstatus in galaxy +## io redirects 0=stdin 1=stdout 2=stderr to dev/null (or &-) + +# Function to send error messages. +log_err() { echo "$@" 1>&2; } + +# path to where mummer suite is installed +# adjust this for your machine +# this is the only hard coded path in the scripts +mum_path="" + +if [ $num_path"$(which mummer)" == "" ] && [ "$num_path" == "" ]; then + log_err "mummer is not available in system path and not declarated in mum_path. Please install mummer." + exit 127 +fi + +# some default options to generate a LARGE fixed PNG/POSTSCRIPT image and not an interactive one. + +if [ "$1" = "png" ]; then + extension="png" +else + extension="ps" +fi + +eval "$mum_path mummerplot --large --$1 $2 1>&- 2>&-" +if [ -f "out.$extension" ]; then + #conditional move to something known by galaxy + mv out.$extension $3 + #remove gnuplot file + rm out.gp +fi + +## clean up +rm out.fplot +rm out.rplot + +#end script
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/mummerplot_tool.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,115 @@ +<tool id="mummerplot_tool" name="MUMmer plot" version="1.0.1" force_history_refresh="true"> + <description>: Generate MUMmerplots from MUMmer match file</description> + <command interpreter="bash"> + mummerplot_tool.sh + #if $img_format=="png": + png $input_match $out_png + #else: + postscript $input_match $out_postscript + #end if + $cmd_extra + </command> + <inputs> + <param name="input_match" type="data" format="tabular" label="MUMmer match (delta or tiling) file" /> + <!-- <conditional name="outType"> --> + <param name="img_format" type="select" label="Output format" > + <option value="png" selected="true">PNG image</option> + <option value="postscript">Postscript</option> + </param> + <!-- </conditional> --> + <param name="cmd_extra" type="text" size="40" value="" label="Extra cmd line options" help="See cmd line options below" /> + </inputs> + <outputs> + <data name="out_png" format="png" label="MUMmerplot png"> + <filter>img_format=="png"</filter> + </data> + <data name="out_postscript" format="ps" label="MUMmerplot ps"> + <filter>img_format=="postscript"</filter> + </data> + </outputs> + <requirements> +<!-- <requirement type="set_environment" version="3.23">MUMMER_PATH</requirement> --> + <requirement type="package" version="4.6.4">gnuplot</requirement> + <requirement type="package" version="3.23">MUMmer</requirement> + </requirements> + <tests> + <test> + </test> + </tests> + <help> +| + + +**Reference** +============= + +- **MUMmerplot Galaxy tool wrapper: Alex Bossers, CVI of Wageningen UR, The Netherlands** + +- **MUMmerplot running on MUMmer-match file:** http://mummer.sourceforge.net/manual#mummerplot + +- **MUMmer tutorials:** http://mummer.sourceforge.net/examples/ + +If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve +or modify the wrappers please add instead of substitute yourself into the acknowlegement section :) + + +**MUMmerplot** +============== + +| This plotting tool requires a MUMmer match file (either the delta file or the tiling result file)! +| MUMmerplot requires gnuplot (www.gnuplot.info) to be installed. +| +| **The plotting has by default set the arguments --large and --png/--postscript to generate a fixed image instead of an interactive view!** Optional cmd line arguments can be used. +| + + + +Mummerplot is a script utility that takes output from *MUMmer, nucmer or promer* as DELTA file, or the +*show-tiling* result file, and converts it to a format suitable for plotting with gnuplot. The primary +plot type is an alignment dotplot where a sequence is laid out on each axis and a point is plotted at +every position where the two sequences show similarity. As an extension to this plot style, mummerplot +is also able to offset multiple 1-vs-1 dotplots to form a multiplot where multiple sequences can be +laid out on each axis. This plot style is especially handy for browsing an alignment of two contig +sets. Identity plots are also possible by coloring each data point with a color gradient representing +identity, or by collapsing the y-axis data onto a single line and then vertically offsetting the +data points by their identities. In addition to producing the plot data, mummerplot also generates a +gnuplot script that will be evaluated in order to generate the graph. + + +The *match file* can either be a three column match list from mummer (either 3 or 4 column format), +the delta file from nucmer or promer, or the default output from show-tiling. mummerplot will +automatically detect the type of input file it is given, regardless of its file extension, or it +will fail if the input file is of an unrecognized type. + + + +Optional command line arguments +------------------------------- + +--breaklen Highlight alignments with a breakpoint further than the given distance from the nearest sequence end +--nocolor Color plot lines with a percent similarity gradient or turn off all color (default color by match direction) +--coverage Generate a reference coverage plot, also known as a percent identity plot (default behavior for show-tiling input) +--depend Print dependency information and exit +--filter Only display alignments which represent the "best" one-to-one mapping of reference and query subsequences (requires delta formatted input) +--help Print help information and exit +--layout Layout a multiplot by ordering and orienting sequences such that the largest hits cluster near the main diagonal (requires delta formatted input) +--prefix *do not use in galaxy!* Set the output file prefix (default 'out') +--rv Reverse video, swap the foreground and background colors for x11 plots (requires x11 terminal) +--IdR Select a specific reference sequence for the x-axis +--IdQ Select a specific query sequence for the y-axis +--Rfile Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired reference sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-] +--Qfile Generate a multiplot by using the order and length information contained in this file, either a FastA file of the desired query sequences or a tab-delimited list of sequence IDs, lengths and orientations [ +-] +--size Set the output size to small, medium or large +--large **default enabled to generate highres image**. Other sizes no effect: --small --medium --large +--SNP Highlight SNP locations in the alignment +--terminal *do not use in galaxy* Set the output terminal to x11, postscript or png +--png **either png or postscript for fixed image**. Other interactive x11 not enabled +--postscript Alternate output format instead of png. +--xrange Set the x-range for the plot in the form "[min,max]" +--yrange Set the y-range for the plot in the form "[min,max]" +--version Display version information and exit + + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/nucmer_coords2ACT_galaxy.pl Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,42 @@ +#!/usr/bin/perl + +# converts the MUMmer-nucmer coords file in a file readable for Artemis Comparison Tool +# Output format is like crunch of BLAST +# +# [nov 2010] Galaxy wrapped up version +# +# Alex.Bossers@wur.nl + + +use warnings; +use strict; + +#$filename=shift; + #$ARGV[0] =~ m/^([A-Z0-9_.-]+)$/ig; +my $filename = $ARGV[0]; + #$ARGV[1] =~ m/^([A-Z0-9_.-]+)$/ig; +my $fileout = $ARGV[1]; +#my $filename = "Curated_vs_noncurated_8067_01.nucmer.coords"; +#my $fileout = "Curated_vs_noncurated_8067_01.nucmer.tab"; + +open (COORDS,$filename) || die "error opening input coords file"; +open (OUT,">$fileout") || die "error opening tab output file"; + +while (<COORDS>) + { + unless ($_ =~ /^(\s*)\d/){next} + $_ =~ s/\|//g; + + my @f = split; + # create crude match score = ((length_of_match * %identity)-(length_of_match * (100 - %identity))) /20 + my $crude_plus_score=($f[4]*$f[6]); + my $crude_minus_score=($f[4]*(100-$f[6])); + my $crude_score= int(($crude_plus_score - $crude_minus_score) / 20); + # reorganise columns and print crunch format to stdout + # score %id S1 E1 seq1 S2 E2 seq2 (description) + print OUT " $crude_score $f[6] $f[0] $f[1] $f[7] $f[2] $f[3] $f[8] nucmer comparison coordinates\n" + } + +close (COORDS); +close (OUT); +print "Done!\n\n";
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/nucmer_coords2ACT_galaxy.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,39 @@ +<tool id="MUMmer2ACT_tool" name="MUMmer2ACT" version="0.1.alx" force_history_refresh="True"> + <description>: convert MUMmer comparison (coords) file to ACT (Artemis)</description> + <command interpreter="perl"> + nucmer_coords2ACT_galaxy.pl $in_coords $out_act + </command> + <inputs> + <param name="in_coords" type="data" format="tabular" label="MUMmer coords file to use" help="i.e. a nucmer comparison (coords) file" /> + </inputs> + <outputs> + <data name="out_act" format="tabular" label="ACT conversion of coords" /> + </outputs> + <requirements> + <!-- <requirement type="perl-script">nucmer_coords2ACT_galaxy.pl</requirement> --> + </requirements> + <tests> + <test> + </test> + </tests> + <help> +| +| + +**Info** +-------- + +This tool will convert the MUMmer comparison file (run MUMmer with the coords option) into a "blast crunch" file +that can be read as a comparison file in Artemic Comparison Tool (ACT). + +It will output a single tabular crunch file (save as extension .tab on windows systems). + +**Reference/questions/remarks** + +- *Conversion perl script and wrapper:* Alex Bossers, CVI of Wageningen UR, The Netherlands. + + + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MUMmer/suite_config.xml Tue Oct 28 16:59:33 2014 +0100 @@ -0,0 +1,22 @@ +<suite id="MUMmer_toolsuite" name="Suite of MUMmer tools" version="1.0.0"> + <description>This suite contains MUMmer genome alignment tools and parsers</description> + <tool id="mummer_tool" name="MUMmer" version="0.4.alx"> + <description>: Compare genomes by alignment (Nucmer or Promer)</description> + </tool> + <tool id="mummer_maxmatch" name="MUMmer MaxMatch" version="0.9.alx" > + <description>: Maximal exact sequence matching</description> + </tool> + <tool id="mummer_clustering" name="MUMmer Clustering" version="0.9.alx"> + <description>: order sequence matches in clusters</description> + </tool> + <tool id="mummer_utilities_tool" name="MUMmer utilities" version="0.9.alx"> + <description>: Show and filter on sequence delta file</description> + </tool> + <tool id="mummerplot_tool" name="MUMmer plot" version="1.0.1"> + <description>: Generate MUMmerplots from MUMmer match file</description> + </tool> + <tool id="MUMmer2ACT_tool" name="MUMmer2ACT" version="0.1.alx"> + <description>: convert MUMmer comparison (coords) file to ACT (Artemis)</description> + </tool> +</suite> +