Mercurial > repos > iuc > longdust
view longdust.xml @ 2:5e9dd32b702a draft default tip
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/main/tools/longdust commit aa41edcbf8a683b0202d18f9bf906f920370ab7a
| author | iuc |
|---|---|
| date | Mon, 01 Dec 2025 12:21:21 +0000 |
| parents | 9607b6eccee4 |
| children |
line wrap: on
line source
<tool id="longdust" name="longdust" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@" license="MIT"> <description>Detect low-complexity regions in long sequences</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements"/> <command detect_errors="exit_code"><![CDATA[ longdust '$input' -k $k -w $w -g $g -t $t -e $e $f $a > '$output' ]]></command> <inputs> <param name="input" type="data" format="fasta,fastq.gz" label="Input FASTA file"/> <param argument="-k" type="integer" label="k-mer length" value="7" help="k-mer length"/> <param argument="-w" type="integer" label="Window size" value="5000" help="Window size"/> <param argument="-g" type="float" label="Genome-wide GC content" value="0.5" help="Specify genome-wide GC content"/> <param argument="-t" type="float" label="Score threshold" value="0.6" help="Score threshold"/> <param argument="-e" type="integer" label="Extension X-drop length" value="50" help="Extension X-drop length (0 to disable)"/> <param argument="-f" type="boolean" label="Forward strand only" truevalue="-f" falsevalue="" checked="false" help="Limit analysis to forward strand only" /> <param argument="-a" type="boolean" label="Enable Guaranteed O(Lw) Algorithm" truevalue="-a" falsevalue="" checked="false" help="Use the guaranteed O(Lw) algorithm with increased approximation for faster runtime on large genomes. This mode evaluates only the smallest candidate start per position, reducing runtime to a strict O(Lw) but may miss ~5-10% of low-complexity regions compared to the default."/> </inputs> <outputs> <data name="output" format="bed"/> </outputs> <tests> <test expect_num_outputs="1"> <param name="input" location="https://zenodo.org/records/17226147/files/GCF_000146045.2_R64_genomic.fna.gz"/> <param name="k" value="6"/> <param name="w" value="1000"/> <param name="t" value="0.55"/> <param name="g" value="0.5"/> <param name="e" value="0"/> <param name="f" value="false"/> <param name="a" value="false"/> <output name="output" ftype="bed"> <assert_contents> <has_n_columns n="3"/> <has_n_lines n="7426"/> </assert_contents> </output> </test> </tests> <help><![CDATA[ .. class:: infomark **What it does** *longdust* detects low-complexity (dusty) regions in long DNA sequences. It scans input FASTA sequences using k-mer statistics and reports regions that fall below a complexity threshold. These regions are often repetitive or homopolymeric stretches that may interfere with sequence analysis, alignment, or downstream bioinformatics pipelines. The method is tunable via parameters for k-mer size, window size, score threshold, and extension length, allowing you to control how strict or relaxed the detection should be. **Input** - A FASTA file containing DNA sequences (typically long reads or assembled contigs). - Optional parameters to configure detection: - **-k** : k-mer length (default 7) - **-w** : window size (default 5000) - **-t** : score threshold (default 0.6) - **-e** : extension X-drop length, 0 disables extension (default 50) - **-f** : forward strand only (optional flag) - **-a** : approximate O(Lw) algorithm (optional flag) * Recommend w < 4^k for performance, especially given large w * Use "-k6 -w1000 -t.55" for more relaxed but shorter regions **Output** - A BED file listing detected low-complexity regions ]]></help> <expand macro="citations"/> <expand macro="creator"/> </tool>
