Mercurial > repos > greg > ideas_preprocessor
changeset 16:aaf64c0d7a0e draft
Uploaded
author | greg |
---|---|
date | Tue, 30 Jan 2018 09:34:17 -0500 |
parents | ce2021cd68d2 |
children | 6ff92012abb7 |
files | .shed.yml ideas_preprocessor.xml test-data/chrom_windows.bed test-data/e001-h3k4me3.bigwig test-data/output.ideaspre |
diffstat | 5 files changed, 112 insertions(+), 2 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/.shed.yml Tue Jan 30 09:34:17 2018 -0500 @@ -0,0 +1,13 @@ +name: ideas_preprocessor +owner: greg +description: | + Contains a tool that accepts a list of epigenetic data sets and produces an output with datatype IdeasPre. +homepage_url: http://sites.stat.psu.edu/~yzz2/IDEAS/ +long_description: | + Contains a tool that accepts a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, + TFs, etc.) or any other whole-genome data sets (e.g., scores). Currently the supported data formats are BigWig + and BAM. The tool produces a single dataset with the IdeasPre datatype for use as input to the IDEAS tool. +remote_repository_url: https://github.com/gregvonkuster/galaxy_tools/tree/master/tools/epigenetics/ideas_preprocessor +type: unrestricted +categories: +- Epigenetics
--- a/ideas_preprocessor.xml Thu Jan 25 11:14:36 2018 -0500 +++ b/ideas_preprocessor.xml Tue Jan 30 09:34:17 2018 -0500 @@ -1,4 +1,4 @@ -<tool id="ideas_preprocessor" name="IDEAS preprocessor" version="1.0.0"> +<tool id="ideas_preprocessor" name="IDEAS Preprocessor" version="1.0.0"> <description></description> <requirements> <requirement type="package" version="2.5.4">deeptools</requirement> @@ -127,7 +127,7 @@ </param> </when> <when value="manual"> - <repeat name="input_repeat" title="Cell type, Epigenetic factor and Input" min="1"> + <repeat name="input_repeat" title="Cell type, epigenetic factor and input" min="1"> <param name="cell_type_name" type="text" value="" label="Cell type name"> <validator type="empty_field"/> </param> @@ -180,15 +180,53 @@ </outputs> <tests> <test> + <param name="input" value="e001-h3k4me3.bigwig" ftype="bigwig" dbkey="hg19"/> + <param name="specify_chrom_windows" value="yes"/> + <param name="chrom_bed_input" value="chrom_windows.bed" ftype="bed" dbkey="hg19"/> + <output name="output" file="output.ideaspre" ftype="ideaspre" /> </test> </tests> <help> **What it does** +Takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc.) +or any other whole-genome data sets (e.g., scores). Currently the supported data formats are BigWig and BAM. +All data sets are mapped by to a common genomic coordinate in a selected assembly (user-provided window size +or 200bp windows by default). The user can specify regions to be considered or removed from the analysis. +The input data may come from one cell type/condition/individual/time point (although this approach does not +fully utilize the advantages of IDEAS), or from multiple cell types/conditions/individuals/time points. The +same set of epigenetic features may not be present in all cell types, in which case IDEAS perfroms imputation +of the missing tracks if specified. This tool produces a single dataset with the **IdeasPre** datatype for +use as input to the IDEAS tool. + ----- **Required options** +* **Set cell type and epigenetic factor names by** - cell type and epigenetic factor names can be set manually or by extracting them from the names of the selected input datasets. The latter case requires all selected datasets to have names that contain a "-" character. + + * **BAM or BigWig files** - select one or more Bam or Bigwig files from your history, making sure that the name of every selected input include a "-" character (e.g., e001-h3k4me3.bigwig). + * **Cell type, Epigenetic factor and Input** - manually select any number of inputs, setting the cell type and epigenetic factor name for each. The combination of "cell type name" and "epigenetic factor name" must be unique for each input. For example, if you have replicate data you may want to specify the cell name as "rep1", "rep2", etc and the factor name as "rep1", "rep2", etc. + + * **Cell type name** - cell type name if specifying manually. + * **Epigenetic factor name** - epigenetic factor name if specifying manually. + * **BAM or BigWig file** - BAM or BigWig file. + * **Selected input file name pattern is** - select the file name pattern, either **epigenetic factor name-cell type name** or **cell type name-epigenetic factor name**. + +* **Define chromosome window positions from a bed file** - select "No" to run whole genome segmentation or select "Yes" to segment genomes within the unit of the windows defined by the bed file. This file can be in BED3, BED4 or BED5 format, but only the first three columns (chr posst posed) will be used. + + * **Window size in base pairs** - Window size in base pairs if specifying manually. + * **Restrict processing to specified chromosomes** - select "Yes" to restrict processing to specified chromosomes. + + * **Chromosomes** - enter a comma-separated list of chromosomes for processing. + + * **Select bed file for defining chromosome window positions** - select a bed file for specifying the chromosome window positions. + +* **Output chromosomes in separate files** - select "Yes" to produce separate files for each chromosome, allowing you to run IDEAS on different chromosomes separately. +* **Calculate the signal in each window using** - use the bigWigAverageOverBed utility from the UCSC genome browser to calculate the signal (i.e., the number of reads per bp) in each window. +* **Select bed file(s) containing regions to exclude** - select one or more bed files that contains regions you'd like excluded from your datasets. +* **Standardize all datasets** - select "Yes" to standardize all datasets (e.g., reads / total_reads * 20 million) so that the signals from different cell types become comparable - your datasets can be read counts, logp-values or fold change. + </help> <citations> <citation type="doi">10.1093/nar/gkw278</citation>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/chrom_windows.bed Tue Jan 30 09:34:17 2018 -0500 @@ -0,0 +1,50 @@ +chr1 21819600 21819800 R100001 +chr1 21819800 21820000 R100002 +chr1 21820000 21820200 R100003 +chr1 21820200 21820400 R100004 +chr1 21820400 21820600 R100005 +chr1 21820600 21820800 R100006 +chr1 21820800 21821000 R100007 +chr1 21821000 21821200 R100008 +chr1 21821200 21821400 R100009 +chr1 21821400 21821600 R100010 +chr1 21821600 21821800 R100011 +chr1 21821800 21822000 R100012 +chr1 21822000 21822200 R100013 +chr1 21822200 21822400 R100014 +chr1 21822400 21822600 R100015 +chr1 21822600 21822800 R100016 +chr1 21822800 21823000 R100017 +chr1 21823000 21823200 R100018 +chr1 21823200 21823400 R100019 +chr1 21823400 21823600 R100020 +chr1 21823600 21823800 R100021 +chr1 21823800 21824000 R100022 +chr1 21824000 21824200 R100023 +chr1 21824200 21824400 R100024 +chr1 21824400 21824600 R100025 +chr1 21824600 21824800 R100026 +chr1 21824800 21825000 R100027 +chr1 21825000 21825200 R100028 +chr1 21825200 21825400 R100029 +chr1 21825400 21825600 R100030 +chr1 21825600 21825800 R100031 +chr1 21825800 21826000 R100032 +chr1 21826000 21826200 R100033 +chr1 21826200 21826400 R100034 +chr1 21826400 21826600 R100035 +chr1 21826600 21826800 R100036 +chr1 21826800 21827000 R100037 +chr1 21827000 21827200 R100038 +chr1 21827200 21827400 R100039 +chr1 21827400 21827600 R100040 +chr1 21827600 21827800 R100041 +chr1 21827800 21828000 R100042 +chr1 21828000 21828200 R100043 +chr1 21828200 21828400 R100044 +chr1 21828400 21828600 R100045 +chr1 21829000 21829200 R100046 +chr1 21829400 21829600 R100047 +chr1 21829600 21829800 R100048 +chr1 21829800 21830000 R100049 +chr1 21830000 21830200 R100050
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/output.ideaspre Tue Jan 30 09:34:17 2018 -0500 @@ -0,0 +1,9 @@ +<html><head></head><body><h3>History item 3 files prepared for IDEAS</h3> +<ul> +<li><a href="chromosome_windows.txt">chromosome_windows.txt</a></li> +<li><a href="chromosomes.bed">chromosomes.bed</a></li> +<li><a href="IDEAS_input_config.txt">IDEAS_input_config.txt</a></li> +<li><a href="tmp">tmp</a></li> +</ul> +</body> +</html> \ No newline at end of file