Mercurial > repos > iuc > emboss_needleall

<tool id="emboss_needleall" name="EMBOSS: needleall" version="@VERSION@+galaxy0" profile="@PROFILE@">
  <description>Many-to-many Needleman-Wunsch global alignment</description>
  <macros>
    <import>macros.xml</import>
  </macros>
  <expand macro="bio_tools" />
  <expand macro="requirements" />
  <version_command>needleall -version</version_command>
  <command detect_errors="exit_code"><![CDATA[
needleall
-asequence '$asequence'
-bsequence '$bsequence'
-outfile '$out_file1'
-gapopen $gapopen
-gapextend $gapextend
-brief $brief
-aformat3 $out_format1
-auto
#if $datafile
-datafile $datafile
#end if
#if $endgap.endweight == 'yes'
-endopen $endgap.endopen
-endextend $endgap.endextend
#end if
-minscore $minscore
]]></command>
  <inputs>
    <param argument="-asequence" type="data" format="fasta,fastq" label="Sequence set 1" />
    <param argument="-bsequence" type="data" format="fasta,fastq" label="Sequence set 2" />

    <expand macro="scoring_matrix"/>
    <expand macro="gap_penalties"/>
    <expand macro="endgap_penalties"/>
    <expand macro="param_brief"/>

    <param argument="-minscore" type="float" value="1.0" min="-10.0" max="100.0" label="Minimum alignment score to report an alignment." help=""/>

    <expand macro="choose_alignment_output_format"/>
  </inputs>
  <outputs>
    <data name="out_file1" format="needle" label="${tool.name} on ${on_string}: alignment output">
      <expand macro="change_alignment_output_format"/>
    </data>
  </outputs>
  <tests>
    <test>
      <param name="asequence" value="emboss_needleall_input1.fa"/>
      <param name="bsequence" value="emboss_needleall_input2.fq"/>
      <param name="gapopen" value="10"/>
      <param name="gapextend" value="0.5"/>
      <param name="brief" value="yes"/>
      <param name="out_format1" value="score"/>
      <output name="out_file1" file="emboss_needleall_out.score" ftype="score"/>
    </test>
    <test><!-- test fasta output -->
      <param name="asequence" value="emboss_needleall_input1.fa"/>
      <param name="bsequence" value="emboss_needleall_input2.fq"/>
      <param name="gapopen" value="10"/>
      <param name="gapextend" value="0.5"/>
      <param name="brief" value="yes"/>
      <param name="out_format1" value="fasta"/>
      <output name="out_file1" file="emboss_needleall_out.fasta" ftype="fasta"/>
    </test>
     <test><!-- test with pair output, endgap penalties and custom scoring matrix -->
      <param name="asequence" value="emboss_needleall_input1.fa"/>
      <param name="bsequence" value="emboss_needleall_input2.fq"/>
      <param name="gapopen" value="10"/>
      <param name="gapextend" value="0.5"/>
      <conditional name="endgap">
        <param name="endweight" value="yes"/>
        <param name="endopen" value="13.37"/>
        <param name="endextend" value="2.5"/>
      </conditional>
      <param name="brief" value="yes"/>
      <param name="datafile" value="EPAM30"/>
      <param name="out_format1" value="pair"/>
      <output name="out_file1" file="emboss_needleall_out.pair" lines_diff="10" ftype="pair"/>
    </test>
  </tests>
  <help><![CDATA[

needleall reads in two nucleotide or protein sequences inputs. Both can be one or more sequences. All sequences in the first input are aligned to all sequences in the second input.

This tool uses the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire length.

- **Optimal alignment:** Dynamic programming methods ensure the optimal global alignment by exploring all possible alignments and choosing the best.

- **The Needleman-Wunsch algorithm** is a member of the class of algorithms that can calculate the best score and alignment in the order of mn steps, (where 'n' and 'm' are the lengths of the two sequences).

- **Gap open penalty:** [10.0 for any sequence] The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. (Floating point number from 1.0 to 100.0)

- **Gap extension penalty:** [0.5 for any sequence] The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. (Floating point number from 0.0 to 10.0)

  ]]></help>
  <expand macro="citations" />
</tool>