Mercurial > repos > greg > multigps
view multigps.xml @ 9:32c8c38cd651 draft
Uploaded
author | greg |
---|---|
date | Tue, 05 Jan 2016 08:19:20 -0500 |
parents | 949d2dedaedc |
children | 426f8753acb2 |
line wrap: on
line source
<tool id="multigps" name="MultiGPS" version="0.5.0.0"> <description>analyzes collections of multi-condition ChIP-seq data</description> <macros> <import>multigps_macros.xml</import> </macros> <expand macro="requirements" /> <command> <![CDATA[ #set aoc = $advanced_options_cond #if str($aoc.advanced_options) == "display": #set umc = $aoc.use_motif_cond #if str($umc.use_motif) == "yes": #set rgc = $umc.reference_genome_cond #if str($rgc.reference_genome_source) == "cached": #set seq_dir = os.path.split($rgc.reference_genome.fields.path)[0] #else: ## MultiGPS requires a directory containing reference files, so symlink the history dataset. #import os #import tempfile #set seq_dir = tempfile.mkdtemp(prefix="tmp-multigps-seq-dir") #set seq_file = str($rgc.reference_genome) #set tmp_filename = "%s.fa" % str($rgc.reference_genome.dbkey) #set tmp_seq_file = os.path.join($seq_dir, $tmp_filename) ln -f -s $tmp_seq_file $seq_file && #end if #end if #end if python $__tool_directory__/multigps.py --multigps_jar $__tool_directory__/multigps_v0.5.jar #for $i in $input_items: #set replicate_name = "" #set read_distribution_file = "" #set fixed_read = "" #set sccond = $i.signal_control_cond #set sorc = $sccond.signal_control #if str($sorc) == "Signal": #set replicate_name = $sccond.replicate_name #set rdcond = $sccond.read_distribution_cond #if str($rdcond.read_distribution) == "yes": #set read_distribution_file = "$rdcond.read_distribution_file" #end if #if str($sccond.fixed_read_count) == "yes": #set fixed_read = "P" #end if #else if str($sorc) == "Control": #set rncond = $sccond.replicate_name_cond #if str($rncond.specify_replicate_name) == "yes": #set replicate_name = $rncond.replicate_name #set rdcond = $rncond.read_distribution_cond #if str($rdcond.read_distribution) == "yes": #set read_distribution_file = "$rdcond.read_distribution_file" #end if #if str($rncond.fixed_read_count) == "yes": #set fixed_read = "P" #end if #end if #end if --input_item "${i.input}" "${i.input.ext}" "${i.signal_control_cond.signal_control}" "${i.condition_name}" "$replicate_name" "$read_distribution_file" "$fixed_read" #end for --threads="\${GALAXY_SLOTS:-4}" --geninfo $chromInfo #if str($aoc.advanced_options) == "display": #set rbec = $aoc.report_binding_events_cond #set bmsc = $aoc.binding_model_smoothing_cond #set rloc = $aoc.reads_limits_options_cond #set sdc = $aoc.scale_data_cond --use_motif $umc.use_motif #if str($umc.use_motif) == "yes": #set mpc = $umc.multigps_priors_cond --seq_dir $seq_dir #if str($mpc.multigps_priors) == "yes": #set bmc = $mpc.both_motifs_cond --positional_prior $mpc.positional_prior --events_shared_probability $mpc.events_shared_probability --motifs $bmc.motifs #if str($bmc.motifs) == "yes": --num_motifs $bmc.num_motifs --mememinw $bmc.min_motif_width --mememaxw $bmc.max_motif_width #else: #set mfoc = $bmc.motif_finding_only_cond --motif_finding_only $mfoc.motif_finding_only #if str($mfoc.motif_finding_only) == "yes": --num_motifs $mfoc.num_motifs --mememinw $mfoc.min_motif_width --mememaxw $mfoc.max_motif_width #end if #end if #end if #end if --max_training_rounds $aoc.max_training_rounds --exclude_file $aoc.exclude_file --binding_model_updates $aoc.binding_model_updates --minmodelupdateevents $aoc.minmodelupdateevents --binding_model_smoothing $bmsc.binding_model_smoothing #if str($bmsc.binding_model_smoothing) == "yes": --spline_smooth $bmsc.spline_smooth #else: #set gmsc = $bmsc.gauss_model_smoothing_cond #if str($gmsc.gauss_model_smoothing) == "yes": --gauss_smooth $gmsc.gauss_smooth #end if #end if --joint_in_model $aoc.joint_in_model --ml_config_not_shared $aoc.ml_config_not_shared #if str($rloc.reads_limits) == "yes": --fixedpb $rloc.fixedpb --poissongausspb $rloc.poissongausspb --non_unique_reads $rloc.non_unique_reads #end if #if str($rbec.report_binding_events) == "yes": --minqvalue $rbec.minqvalue --minfold $rbec.minfold --diff_enrichment_tests $rbec.diff_enrichment_tests --edgerod $rbec.edgerod --diffp $rbec.diffp #end if #if str($sdc.scale_data) == "yes": --noscaling $sdc.scaling --medianscale $sdc.medianscale --sesscale $sdc.sesscale --scalewin $sdc.scalewin #end if #end if --output_html_path "$output_html" --output_html_files_path "$output_html.files_path" #if str($output_process_log) == "yes": --output_process_path "$output_process" #end if ]]> </command> <inputs> <repeat name="input_items" title="Input files, attributes and options" min="1"> <param name="input" type="data" format="bam,bed,scidx" label="Add input file" help="Supported formats are bam, bed and scidx"> <validator type="unspecified_build" /> </param> <conditional name="signal_control_cond"> <param name="signal_control" type="select" label="Is this experiment signal or control?"> <option value="Signal" selected="True">Signal</option> <option value="Control">Control</option> </param> <when value="Signal"> <param name="replicate_name" type="text" label="Replicate name" /> <expand macro="rd_cond" /> <expand macro="frc_param" /> </when> <when value="Control"> <conditional name="replicate_name_cond"> <param name="specify_replicate_name" type="select" label="Specify replicate name?" help="Optional for control experiments. If used, the control will only be used for the corresponding named signal replicate"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="yes"> <param name="replicate_name" type="text" optional="True" label="Replicate name" /> <expand macro="rd_cond" /> <expand macro="frc_param" /> </when> <when value="no" /> </conditional> </when> </conditional> <param name="condition_name" type="text" label="Condition name" /> </repeat> <conditional name="advanced_options_cond"> <param name="advanced_options" type="select" label="Advanced options"> <option value="hide" selected="true">Hide</option> <option value="display">Display</option> </param> <when value="display"> <conditional name="use_motif_cond"> <param name="use_motif" type="select" label="Perform motif-finding or use a motif-prior?"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="yes"> <conditional name="reference_genome_cond"> <param name="reference_genome_source" type="select" label="Choose the source for the reference genome"> <option value="cached">Locally Cached</option> <option value="history">From History</option> </param> <when value="cached"> <param name="reference_genome" type="select" label="Using reference genome"> <options from_data_table="fasta_indexes"/> <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/> </param> </when> <when value="history"> <param name="reference_genome" type="data" format="fasta" label="Using reference genome"/> </when> </conditional> <conditional name="multigps_priors_cond"> <param name="multigps_priors" type="select" label="Specify MultiGPS priors options?"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="no" /> <when value="yes"> <param name="positional_prior" type="select" label="Perform inter-experiment positional prior?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="events_shared_probability" type="float" value="0.9" min="0.0" label="Probability that events are shared across conditions" /> <conditional name="both_motifs_cond"> <param name="motifs" type="select" label="Perform both motif-finding and motif priors?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <when value="yes"> <expand macro="motif_finding_params" /> </when> <when value="no"> <conditional name="motif_finding_only_cond"> <param name="motif_finding_only" type="select" label="Perform motif-finding only?" help="Selecting Yes turns off motif priors."> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="no" /> <when value="yes"> <expand macro="motif_finding_params" /> </when> </conditional> </when> </conditional> </when> </conditional> </when> <when value="no" /> </conditional> <param name="max_training_rounds" type="integer" value="3" min="0" label="Maximum number of training rounds for updating binding event read distributions" /> <param name="exclude_file" type="data" optional="True" format="txt" label="Optional file containing a set of regions to ignore during MultiGPS training" help="Ideally exclude the mitochondrial genome and other blacklisted regions that contain artifactual accumulations of reads in both ChIP-seq and control experiments." /> <param name="binding_model_updates" type="select" label="Perform binding model updates?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="minmodelupdateevents" type="integer" value="0" min="0" label="Minimum number of events to support an update of the read distribution" /> <conditional name="binding_model_smoothing_cond"> <param name="binding_model_smoothing" type="select" label="Perform binding model smoothing?" help="Smoothing performed with a cubic spline."> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <when value="yes"> <param name="spline_smooth" type="integer" value="0" min="0" label="Smoothing factor" /> </when> <when value="no"> <conditional name="gauss_model_smoothing_cond"> <param name="gauss_model_smoothing" type="select" label="Use Gaussian model smoothing?" help="Select No to smooth with a cubic spline."> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="yes"> <param name="gauss_smooth" type="integer" value="3" min="0" label="Smoothing factor" help="Gaussian smoothing standard deviation." /> </when> <when value="no" /> </conditional> </when> </conditional> <param name="joint_in_model" type="select" label="Allow joint events in model updates?"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <param name="ml_config_not_shared" type="select" label="Share component configs in the ML step?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <conditional name="reads_limits_options_cond"> <param name="reads_limits" type="select" label="Set limits on how many reads can have their 5′ end at the same position in each replicate?" help="Default behavior is to estimate a global per-base limit from a Poisson distribution parameterized by the number of reads divided by the number of mappable bases in the genome. The per-base limit is set as the count corresponding to the 10^-7 probability level from the Poisson."> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="no" /> <when value="yes"> <param name="fixedpb" type="integer" value="0" min="0" label="Fixed per-base limit" /> <param name="poissongausspb" type="integer" value="0" min="0" label="Poisson threshold for filtering per base" help="Filter per base using the specified Poisson threshold parameterized by a local Gaussian sliding window" /> <param name="non_unique_reads" type="select" label="Use non-unique reads"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> </when> </conditional> <conditional name="scale_data_cond"> <param name="scale_data" type="select" label="Set data scaling parameters?" help="Default behavior is to scale signal to corresponding controls using regression on the set of signal/control ratios in 10Kbp windows."> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="yes"> <param name="scaling" type="select" label="Use signal vs control scaling?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="medianscale" type="select" label="Use the median signal/control ratio as the scaling factor?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="sesscale" type="select" label="Estimate scaling factor by SES?" help="SES: Diaz, et al. Stat Appl Genet Mol Biol. 2012"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="scalewin" type="integer" min="0" value="10000" label="Window size for estimating scaling ratios" help="The value is the number of base pairs. Use something much smaller than the default if scaling via SES (e.g. 200)." /> </when> <when value="no" /> </conditional> <conditional name="report_binding_events_cond"> <param name="report_binding_events" type="select" label="Report binding events?"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> <when value="no" /> <when value="yes"> <param name="minqvalue" type="integer" min="0" value="0" label="Minimum Q-value (corrected p-value) of reported binding events" /> <param name="minfold" type="integer" min="0" value="0" label="Minimum event fold-change vs scaled control" /> <param name="diff_enrichment_tests" type="select" label="Run differential enrichment tests?"> <option value="yes" selected="True">Yes</option> <option value="no">No</option> </param> <param name="edgerod" type="integer" min="0" value="0" label="EdgeR over-dispersion parameter value" /> <param name="diffp" type="integer" min="0" value="0" label="Minimum p-value for reporting differential enrichment" /> </when> </conditional> </when> <when value="hide"/> </conditional> <param name="output_process_log" type="select" label="Output MultiGPS process log?"> <option value="no" selected="True">No</option> <option value="yes">Yes</option> </param> </inputs> <outputs> <data name="output_process" format="txt" label="${tool.name} on ${on_string} (process log)"> <filter>output_process_log == "yes"</filter> </data> <data name="output_html" format="html"/> </outputs> <tests> <test> <repeat name="input_items"> <param name="input" value="sacCer3_1.scidx" ftype="scidx" dbkey="sacCer3"/> <param name="signal_control" value="Signal"/> <param name="condition_name" value="Abf1"/> <param name="replicate_name" value="1"/> <param name="read_distribution" value="no"/> <param name="fixed_read_count" value="no"/> </repeat> <param name="binding_model_smoothing" value="no"/> <param name="gauss_model_smoothing" value="yes"/> <param name="gauss_smooth" value="3"/> <param name="use_motif" value="yes"/> <param name="reference_genome_source" value="history"/> <param name="reference_genome" value="phiX.fasta" dbkey="phiX"/> <param name="num_motifs" value="3"/> <param name="min_motif_width" value="6"/> <param name="max_motif_width" value="16"/> <param name="output_process_log" value="yes"/> <output name="output_process" file="output_process1.txt" ftype="txt" lines_diff="12"/> <output name="output_html" file="output_html1.html" ftype="html" lines_diff="12"/> </test> </tests> <help> **What it does** MultiGPS is a framework for analyzing collections of multi-condition ChIP-seq datasets and characterizing differential binding events between conditions. MultiGPS encourages consistency in the reported binding event locations across conditions and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS loads all data to memory, so you will need a lot of available memory if you are running analysis over many conditions or large datasets. ----- **Options** * **Input files, attributes and options** - **Is this experiment signal or control?** - Designate the associated input file as a “signal” or “control” experiment. - **Condition name** - Condition name. - **Replicate name** - This is optional for control experiments, and if defined, the control will only be used for the corresponding named signal replicate. - **Read distribution file** - Optional binding event read distribution file (appropriate for the specified replicate) for initializing models. If not specified, the default distribution is used. The true distribution of reads around binding events is estimated during MultiGPS training. - **Use fixed per-base read count limit for this replicate?** - Optional fixed per-base read count limit for the specified replicate. Selecting "Yes" sets a read count limit that varies along the genome according to how neighboring bases are distributed, while selecting "No" sets a global per-base limit that is estimated from a Poisson distribution. * **Perform motif-finding or use a motif-prior?** - Integrate motif-finding or use a motif-prior via MEME. - **Choose the source for the reference genome** - Reference data can be locally cached or selected from the Galaxy history. - **Perform inter-experiment positional prior?** - Perform inter-experiment positional prior. - **Probability that events are shared across conditions** - Probability that events are shared across conditions. - **Perform both motif-finding and motif priors?** - Select "No" to turn off motif-finding and motif priors. - **Perform motif-finding only?** - Select "Yes" to turn off motif priors, performing motif-finding only. - **Number of motifs MEME should find for each condition** - Number of motifs MEME should find for each condition. - **Minimum motif width for MEME** - Minimum motif width argument for MEME. - **Maximum motif width for MEME** - Maximum motif width argument for MEME. * **General Advanced Options** - **Maximum number of training rounds for updating binding event read distributions** - Maximum number of training rounds for updating binding event read distributions - **Optional file containing a set of regions to ignore during MultiGPS training** - It’s a good idea to exclude the mitochondrial genome and other ‘blacklisted’ regions that contain artifactual accumulations of reads in both ChIP-seq and control experiments. MultiGPS will waste time trying to model binding events in these regions, even though they will not typically appear significantly enriched over the control (and thus will not be reported to the user). - **Perform binding model updates?** - Perform binding model updates? - **Minimum number of events to support an update of the read distribution** - Minimum number of events to support an update of the read distribution - **Perform binding model smoothing?** - Smooth with a cubic spline using a specified smoothing factor. - **Perform Gaussian model smoothing?** - Select "Yes" to use Gaussian model smoothing using a specified smoothing factor if binding model smoothing is not performed. - **Allow joint events in model updates?** - Specify whether to allow joint events in model updates. - **Share component configs in the ML step?** - Specify whether to share component configs in the ML step. This mainly affects the quantification of binding levels for binding events that are not shared but are located at nearby locations across experiments. * **Set limits on how many reads can have their 5′ end at the same position in each replicate?** - **Fixed per-base limit** - Fixed per-base limit. - **Poisson threshold for filtering per base** - Look at neighboring positions to decide what the per-base limit should be. - **Use non-unique reads** - Use non-unique reads. * **Set data scaling parameters?** - **Use signal vs control scaling?** - Specify whether to use signal vs control scaling. - **Use the median signal/control ratio as the scaling factor?** - Specify whether to use the median signal/control ratio as the scaling factor. - **Estimate scaling factor by SES?** - Specify whether to estimate scaling factor by SES. - **Window size for estimating scaling ratios** - Window size in base pairs for estimating scaling ratios * **Report binding events?** - **Minimum Q-value (corrected p-value) of reported binding events** - Minimum Q-value (corrected p-value) of reported binding events. - **Minimum event fold-change vs scaled control** - Minimum event fold-change vs scaled control. - **Run differential enrichment tests?** - Choose whether to run differential enrichment tests. - **EdgeR over-dispersion parameter value** - EdgeR over-dispersion parameter value. - **Minimum p-value for reporting differential enrichment** - Minimum p-value for reporting differential enrichment. * **Output MultiGPS process log?** - Select "Yes" to produce a second output dataset that contains the MultiGPS process log. </help> <expand macro="citations" /> </tool>