Mercurial > repos > greg > ideas_preprocessor

diff ideas_preprocessor.xml @ 16:aaf64c0d7a0e draft
Uploaded
author: greg
date: Tue, 30 Jan 2018 09:34:17 -0500
parents: 4d542da396a7
children: 6ff92012abb7
--- a/ideas_preprocessor.xml	Thu Jan 25 11:14:36 2018 -0500
+++ b/ideas_preprocessor.xml	Tue Jan 30 09:34:17 2018 -0500
@@ -1,4 +1,4 @@
-<tool id="ideas_preprocessor" name="IDEAS preprocessor" version="1.0.0">
+<tool id="ideas_preprocessor" name="IDEAS Preprocessor" version="1.0.0">
     <description></description>
     <requirements>
         <requirement type="package" version="2.5.4">deeptools</requirement>
@@ -127,7 +127,7 @@
                 </param>
             </when>
             <when value="manual">
-                <repeat name="input_repeat" title="Cell type, Epigenetic factor and Input" min="1">
+                <repeat name="input_repeat" title="Cell type, epigenetic factor and input" min="1">
                     <param name="cell_type_name" type="text" value="" label="Cell type name">
                         <validator type="empty_field"/>
                     </param>
@@ -180,15 +180,53 @@
     </outputs>
     <tests>
         <test>
+            <param name="input" value="e001-h3k4me3.bigwig" ftype="bigwig" dbkey="hg19"/>
+            <param name="specify_chrom_windows" value="yes"/>
+            <param name="chrom_bed_input" value="chrom_windows.bed" ftype="bed" dbkey="hg19"/>
+            <output name="output" file="output.ideaspre" ftype="ideaspre" />
         </test>
     </tests>
     <help>
 **What it does**
 
+Takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc.)
+or any other whole-genome data sets (e.g., scores). Currently the supported data formats are BigWig and BAM.
+All data sets are mapped by to a common genomic coordinate in a selected assembly (user-provided window size
+or 200bp windows by default). The user can specify regions to be considered or removed from the analysis.
+The input data may come from one cell type/condition/individual/time point (although this approach does not
+fully utilize the advantages of IDEAS), or from multiple cell types/conditions/individuals/time points. The
+same set of epigenetic features may not be present in all cell types, in which case IDEAS perfroms imputation
+of the missing tracks if specified.  This tool produces a single dataset with the **IdeasPre** datatype for
+use as input to the IDEAS tool.
+
 -----
 
 **Required options**
 
+* **Set cell type and epigenetic factor names by** - cell type and epigenetic factor names can be set manually or by extracting them from the names of the selected input datasets.  The latter case requires all selected datasets to have names that contain a "-" character.
+
+ * **BAM or BigWig files** - select one or more Bam or Bigwig files from your history, making sure that the name of every selected input include a "-" character (e.g., e001-h3k4me3.bigwig).
+ * **Cell type, Epigenetic factor and Input** - manually select any number of inputs, setting the cell type and epigenetic factor name for each.  The combination of "cell type name" and "epigenetic factor name" must be unique for each input.  For example, if you have replicate data you may want to specify the cell name as "rep1", "rep2", etc and the factor name as "rep1", "rep2", etc.
+
+  * **Cell type name** - cell type name if specifying manually.
+  * **Epigenetic factor name** - epigenetic factor name if specifying manually.
+  * **BAM or BigWig file** - BAM or BigWig file.
+  * **Selected input file name pattern is** - select the file name pattern, either **epigenetic factor name-cell type name** or **cell type name-epigenetic factor name**.
+
+* **Define chromosome window positions from a bed file** - select "No" to run whole genome segmentation or select "Yes" to segment genomes within the unit of the windows defined by the bed file.  This file can be in BED3, BED4 or BED5 format, but only the first three columns (chr posst posed) will be used.
+
+ * **Window size in base pairs** - Window size in base pairs if specifying manually.
+ * **Restrict processing to specified chromosomes** - select "Yes" to restrict processing to specified chromosomes.
+
+  * **Chromosomes** - enter a comma-separated list of chromosomes for processing.
+
+ * **Select bed file for defining chromosome window positions** - select a bed file for specifying the chromosome window positions.
+
+* **Output chromosomes in separate files** - select "Yes" to produce separate files for each chromosome, allowing you to run IDEAS on different chromosomes separately.
+* **Calculate the signal in each window using** - use the bigWigAverageOverBed utility from the UCSC genome browser to calculate the signal (i.e., the number of reads per bp) in each window.
+* **Select bed file(s) containing regions to exclude** - select one or more bed files that contains regions you'd like excluded from your datasets.
+* **Standardize all datasets** - select "Yes" to standardize all datasets (e.g., reads / total_reads * 20 million) so that the signals from different cell types become comparable - your datasets can be read counts, logp-values or fold change.
+
     </help>
     <citations>
         <citation type="doi">10.1093/nar/gkw278</citation>
author	greg
date	Tue, 30 Jan 2018 09:34:17 -0500
parents	4d542da396a7
children	6ff92012abb7