Mercurial > repos > greg > ideas_preprocessor

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/.shed.yml	Tue Jan 30 09:34:17 2018 -0500
@@ -0,0 +1,13 @@
+name: ideas_preprocessor
+owner: greg
+description: |
+  Contains a tool that accepts a list of epigenetic data sets and produces an output with datatype IdeasPre.
+homepage_url: http://sites.stat.psu.edu/~yzz2/IDEAS/
+long_description: |
+  Contains a tool that accepts a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation,
+  TFs, etc.) or any other whole-genome data sets (e.g., scores). Currently the supported data formats are BigWig
+  and BAM.  The tool produces a single dataset with the IdeasPre datatype for use as input to the IDEAS tool.
+remote_repository_url: https://github.com/gregvonkuster/galaxy_tools/tree/master/tools/epigenetics/ideas_preprocessor
+type: unrestricted
+categories:
+- Epigenetics
--- a/ideas_preprocessor.xml	Thu Jan 25 11:14:36 2018 -0500
+++ b/ideas_preprocessor.xml	Tue Jan 30 09:34:17 2018 -0500
@@ -1,4 +1,4 @@
-<tool id="ideas_preprocessor" name="IDEAS preprocessor" version="1.0.0">
+<tool id="ideas_preprocessor" name="IDEAS Preprocessor" version="1.0.0">
     <description></description>
     <requirements>
         <requirement type="package" version="2.5.4">deeptools</requirement>
@@ -127,7 +127,7 @@
                 </param>
             </when>
             <when value="manual">
-                <repeat name="input_repeat" title="Cell type, Epigenetic factor and Input" min="1">
+                <repeat name="input_repeat" title="Cell type, epigenetic factor and input" min="1">
                     <param name="cell_type_name" type="text" value="" label="Cell type name">
                         <validator type="empty_field"/>
                     </param>
@@ -180,15 +180,53 @@
     </outputs>
     <tests>
         <test>
+            <param name="input" value="e001-h3k4me3.bigwig" ftype="bigwig" dbkey="hg19"/>
+            <param name="specify_chrom_windows" value="yes"/>
+            <param name="chrom_bed_input" value="chrom_windows.bed" ftype="bed" dbkey="hg19"/>
+            <output name="output" file="output.ideaspre" ftype="ideaspre" />
         </test>
     </tests>
     <help>
 **What it does**

+Takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc.)
+or any other whole-genome data sets (e.g., scores). Currently the supported data formats are BigWig and BAM.
+All data sets are mapped by to a common genomic coordinate in a selected assembly (user-provided window size
+or 200bp windows by default). The user can specify regions to be considered or removed from the analysis.
+The input data may come from one cell type/condition/individual/time point (although this approach does not
+fully utilize the advantages of IDEAS), or from multiple cell types/conditions/individuals/time points. The
+same set of epigenetic features may not be present in all cell types, in which case IDEAS perfroms imputation
+of the missing tracks if specified.  This tool produces a single dataset with the **IdeasPre** datatype for
+use as input to the IDEAS tool.
+
 -----

 **Required options**

+* **Set cell type and epigenetic factor names by** - cell type and epigenetic factor names can be set manually or by extracting them from the names of the selected input datasets.  The latter case requires all selected datasets to have names that contain a "-" character.
+
+ * **BAM or BigWig files** - select one or more Bam or Bigwig files from your history, making sure that the name of every selected input include a "-" character (e.g., e001-h3k4me3.bigwig).
+ * **Cell type, Epigenetic factor and Input** - manually select any number of inputs, setting the cell type and epigenetic factor name for each.  The combination of "cell type name" and "epigenetic factor name" must be unique for each input.  For example, if you have replicate data you may want to specify the cell name as "rep1", "rep2", etc and the factor name as "rep1", "rep2", etc.
+
+  * **Cell type name** - cell type name if specifying manually.
+  * **Epigenetic factor name** - epigenetic factor name if specifying manually.
+  * **BAM or BigWig file** - BAM or BigWig file.
+  * **Selected input file name pattern is** - select the file name pattern, either **epigenetic factor name-cell type name** or **cell type name-epigenetic factor name**.
+
+* **Define chromosome window positions from a bed file** - select "No" to run whole genome segmentation or select "Yes" to segment genomes within the unit of the windows defined by the bed file.  This file can be in BED3, BED4 or BED5 format, but only the first three columns (chr posst posed) will be used.
+
+ * **Window size in base pairs** - Window size in base pairs if specifying manually.
+ * **Restrict processing to specified chromosomes** - select "Yes" to restrict processing to specified chromosomes.
+
+  * **Chromosomes** - enter a comma-separated list of chromosomes for processing.
+
+ * **Select bed file for defining chromosome window positions** - select a bed file for specifying the chromosome window positions.
+
+* **Output chromosomes in separate files** - select "Yes" to produce separate files for each chromosome, allowing you to run IDEAS on different chromosomes separately.
+* **Calculate the signal in each window using** - use the bigWigAverageOverBed utility from the UCSC genome browser to calculate the signal (i.e., the number of reads per bp) in each window.
+* **Select bed file(s) containing regions to exclude** - select one or more bed files that contains regions you'd like excluded from your datasets.
+* **Standardize all datasets** - select "Yes" to standardize all datasets (e.g., reads / total_reads * 20 million) so that the signals from different cell types become comparable - your datasets can be read counts, logp-values or fold change.
+
     </help>
     <citations>
         <citation type="doi">10.1093/nar/gkw278</citation>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/chrom_windows.bed	Tue Jan 30 09:34:17 2018 -0500
@@ -0,0 +1,50 @@
+chr1	21819600	21819800	R100001
+chr1	21819800	21820000	R100002
+chr1	21820000	21820200	R100003
+chr1	21820200	21820400	R100004
+chr1	21820400	21820600	R100005
+chr1	21820600	21820800	R100006
+chr1	21820800	21821000	R100007
+chr1	21821000	21821200	R100008
+chr1	21821200	21821400	R100009
+chr1	21821400	21821600	R100010
+chr1	21821600	21821800	R100011
+chr1	21821800	21822000	R100012
+chr1	21822000	21822200	R100013
+chr1	21822200	21822400	R100014
+chr1	21822400	21822600	R100015
+chr1	21822600	21822800	R100016
+chr1	21822800	21823000	R100017
+chr1	21823000	21823200	R100018
+chr1	21823200	21823400	R100019
+chr1	21823400	21823600	R100020
+chr1	21823600	21823800	R100021
+chr1	21823800	21824000	R100022
+chr1	21824000	21824200	R100023
+chr1	21824200	21824400	R100024
+chr1	21824400	21824600	R100025
+chr1	21824600	21824800	R100026
+chr1	21824800	21825000	R100027
+chr1	21825000	21825200	R100028
+chr1	21825200	21825400	R100029
+chr1	21825400	21825600	R100030
+chr1	21825600	21825800	R100031
+chr1	21825800	21826000	R100032
+chr1	21826000	21826200	R100033
+chr1	21826200	21826400	R100034
+chr1	21826400	21826600	R100035
+chr1	21826600	21826800	R100036
+chr1	21826800	21827000	R100037
+chr1	21827000	21827200	R100038
+chr1	21827200	21827400	R100039
+chr1	21827400	21827600	R100040
+chr1	21827600	21827800	R100041
+chr1	21827800	21828000	R100042
+chr1	21828000	21828200	R100043
+chr1	21828200	21828400	R100044
+chr1	21828400	21828600	R100045
+chr1	21829000	21829200	R100046
+chr1	21829400	21829600	R100047
+chr1	21829600	21829800	R100048
+chr1	21829800	21830000	R100049
+chr1	21830000	21830200	R100050
Binary file test-data/e001-h3k4me3.bigwig has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/output.ideaspre	Tue Jan 30 09:34:17 2018 -0500
@@ -0,0 +1,9 @@
+<html><head></head><body><h3>History item 3 files prepared for IDEAS</h3>
+<ul>
+<li><a href="chromosome_windows.txt">chromosome_windows.txt</a></li>
+<li><a href="chromosomes.bed">chromosomes.bed</a></li>
+<li><a href="IDEAS_input_config.txt">IDEAS_input_config.txt</a></li>
+<li><a href="tmp">tmp</a></li>
+</ul>
+</body>
+</html>
\ No newline at end of file