Mercurial > repos > bgruening > upload_testing
comparison tools/annotatePeaks.xml @ 54:675d25a0b9d4
Uploaded
| author | bgruening |
|---|---|
| date | Mon, 12 Aug 2013 08:16:21 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 53:a281b5931ffb | 54:675d25a0b9d4 |
|---|---|
| 1 <tool id="homer_annotatePeaks" name="homer_annotatePeaks" version="0.0.5"> | |
| 2 <requirements> | |
| 3 <requirement type="package" version="4.1">homer</requirement> | |
| 4 </requirements> | |
| 5 <description></description> | |
| 6 <!--<version_command></version_command>--> | |
| 7 <command> | |
| 8 annotatePeaks.pl $input_bed $genome_selector 1> $out_annotated | |
| 9 2> $out_log || echo "Error running annotatePeaks." >&2 | |
| 10 </command> | |
| 11 <inputs> | |
| 12 <param format="tabular,bed" name="input_bed" type="data" label="Homer peaks OR BED format"/> | |
| 13 <param name="genome_selector" type="select" label="Genome version"> | |
| 14 <option value="hg19" selected="true">hg19</option> | |
| 15 </param> | |
| 16 <param type="text" name="options" label="Extra options" value="" help="See link below for more options"> | |
| 17 <sanitizer> | |
| 18 <valid initial="string.printable"> | |
| 19 <remove value="'"/> | |
| 20 <remove value="/"/> | |
| 21 </valid> | |
| 22 <mapping initial="none"> | |
| 23 <add source="'" target="__sq__"/> | |
| 24 </mapping> | |
| 25 </sanitizer> | |
| 26 </param> | |
| 27 </inputs> | |
| 28 <outputs> | |
| 29 <!--<data format="html" name="html_outfile" label="index" />--> | |
| 30 <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> | |
| 31 <data format="csv" name="out_annotated" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}" /> | |
| 32 <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}.log" /> | |
| 33 </outputs> | |
| 34 <tests> | |
| 35 <test> | |
| 36 <!--<param name="input_file" value="extract_genomic_dna.fa" />--> | |
| 37 <!--<output name="html_file" file="sample_output.html" ftype="html" />--> | |
| 38 </test> | |
| 39 </tests> | |
| 40 | |
| 41 <help> | |
| 42 | |
| 43 .. class:: infomark | |
| 44 | |
| 45 **Homer annoatePeaks** | |
| 46 | |
| 47 More information on accepted formats and options | |
| 48 | |
| 49 http://biowhat.ucsd.edu/homer/ngs/annotation.html | |
| 50 | |
| 51 TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format. | |
| 52 | |
| 53 **Parameter list** | |
| 54 | |
| 55 Command line options (not all of them are supported):: | |
| 56 | |
| 57 Usage: annotatePeaks.pl <peak file | tss> <genome version> [additional options...] | |
| 58 | |
| 59 Available Genomes (required argument): (name,org,directory,default promoter set) | |
| 60 -- or -- | |
| 61 Custom: provide the path to genome FASTA files (directory or single file) | |
| 62 | |
| 63 User defined annotation files (default is UCSC refGene annotation): | |
| 64 annotatePeaks.pl accepts GTF (gene transfer formatted) files to annotate positions relative | |
| 65 to custom annotations, such as those from de novo transcript discovery or Gencode. | |
| 66 -gtf <gtf format file> (-gff and -gff3 can work for those files, but GTF is better) | |
| 67 | |
| 68 Peak vs. tss/tts/rna mode (works with custom GTF file): | |
| 69 If the first argument is "tss" (i.e. annotatePeaks.pl tss hg18 ...) then a TSS centric | |
| 70 analysis will be carried out. Tag counts and motifs will be found relative to the TSS. | |
| 71 (no position file needed) ["tts" now works too - e.g. 3' end of gene] | |
| 72 ["rna" specifies gene bodies, will automaticall set "-size given"] | |
| 73 NOTE: The default TSS peak size is 4000 bp, i.e. +/- 2kb (change with -size option) | |
| 74 -list <gene id list> (subset of genes to perform analysis [unigene, gene id, accession, | |
| 75 probe, etc.], default = all promoters) | |
| 76 -cTSS <promoter position file i.e. peak file> (should be centered on TSS) | |
| 77 | |
| 78 Primary Annotation Options: | |
| 79 -mask (Masked repeats, can also add 'r' to end of genome name) | |
| 80 -m <motif file 1> [motif file 2] ... (list of motifs to find in peaks) | |
| 81 -mscore (reports the highest log-odds score within the peak) | |
| 82 -nmotifs (reports the number of motifs per peak) | |
| 83 -mdist (reports distance to closest motif) | |
| 84 -mfasta <filename> (reports sites in a fasta file - for building new motifs) | |
| 85 -fm <motif file 1> [motif file 2] (list of motifs to filter from above) | |
| 86 -rmrevopp <#> (only count sites found within <#> on both strands once, i.e. palindromic) | |
| 87 -matrix <prefix> (outputs a motif co-occurrence files: | |
| 88 prefix.count.matrix.txt - number of peaks with motif co-occurrence | |
| 89 prefix.ratio.matrix.txt - ratio of observed vs. expected co-occurrence | |
| 90 prefix.logPvalue.matrix.txt - co-occurrence enrichment | |
| 91 prefix.stats.txt - table of pair-wise motif co-occurrence statistics | |
| 92 additional options: | |
| 93 -matrixMinDist <#> (minimum distance between motif pairs - to avoid overlap) | |
| 94 -matrixMaxDist <#> (maximum distance between motif pairs) | |
| 95 -mbed <filename> (Output motif positions to a BED file to load at UCSC (or -mpeak)) | |
| 96 -mlogic <filename> (will output stats on common motif orientations) | |
| 97 -d <tag directory 1> [tag directory 2] ... (list of experiment directories to show | |
| 98 tag counts for) NOTE: -dfile <file> where file is a list of directories in first column | |
| 99 -bedGraph <bedGraph file 1> [bedGraph file 2] ... (read coverage counts from bedGraph files) | |
| 100 -wig <wiggle file 1> [wiggle file 2] ... (read coverage counts from wiggle files) | |
| 101 -p <peak file> [peak file 2] ... (to find nearest peaks) | |
| 102 -pdist to report only distance (-pdist2 gives directional distance) | |
| 103 -pcount to report number of peaks within region | |
| 104 -vcf <VCF file> (annotate peaks with genetic variation infomation, one col per individual) | |
| 105 -editDistance (Computes the # bp changes relative to reference) | |
| 106 -individuals <name1> [name2] ... (restrict analysis to these individuals) | |
| 107 -gene <data file> ... (Adds additional data to result based on the closest gene. | |
| 108 This is useful for adding gene expression data. The file must have a header, | |
| 109 and the first column must be a GeneID, Accession number, etc. If the peak | |
| 110 cannot be mapped to data in the file then the entry will be left empty. | |
| 111 -go <output directory> (perform GO analysis using genes near peaks) | |
| 112 -genomeOntology <output directory> (perform genomeOntology analysis on peaks) | |
| 113 -gsize <#> (Genome size for genomeOntology analysis, default: 2e9) | |
| 114 | |
| 115 Annotation vs. Histogram mode: | |
| 116 -hist <bin size in bp> (i.e 1, 2, 5, 10, 20, 50, 100 etc.) | |
| 117 The -hist option can be used to generate histograms of position dependent features relative | |
| 118 to the center of peaks. This is primarily meant to be used with -d and -m options to map | |
| 119 distribution of motifs and ChIP-Seq tags. For ChIP-Seq peaks for a Transcription factor | |
| 120 you might want to use the -center option (below) to center peaks on the known motif | |
| 121 ** If using "-size given", histogram will be scaled to each region (i.e. 0-100%), with | |
| 122 the -hist parameter being the number of bins to divide each region into. | |
| 123 Histogram Mode specific Options: | |
| 124 -nuc (calculated mononucleotide frequencies at each position, | |
| 125 Will report by default if extracting sequence for other purposes like motifs) | |
| 126 -di (calculated dinucleotide frequencies at each position) | |
| 127 -histNorm <#> (normalize the total tag count for each region to 1, where <#> is the | |
| 128 minimum tag total per region - use to avoid tag spikes from low coverage | |
| 129 -ghist (outputs profiles for each gene, for peak shape clustering) | |
| 130 -rm <#> (remove occurrences of same motif that occur within # bp) | |
| 131 | |
| 132 Peak Centering: (other options are ignored) | |
| 133 -center <motif file> (This will re-center peaks on the specified motif, or remove peak | |
| 134 if there is no motif in the peak. ONLY recentering will be performed, and all other | |
| 135 options will be ignored. This will output a new peak file that can then be reanalyzed | |
| 136 to reveal fine-grain structure in peaks (It is advised to use -size < 200) with this | |
| 137 to keep peaks from moving too far (-mirror flips the position) | |
| 138 -multi (returns genomic positions of all sites instead of just the closest to center) | |
| 139 | |
| 140 Advanced Options: | |
| 141 -len <#> / -fragLength <#> (Fragment length, default=auto, might want to set to 0 for RNA) | |
| 142 -size <#> (Peak size[from center of peak], default=inferred from peak file) | |
| 143 -size #,# (i.e. -size -10,50 count tags from -10 bp to +50 bp from center) | |
| 144 -size "given" (count tags etc. using the actual regions - for variable length regions) | |
| 145 -log (output tag counts as log2(x+1+rand) values - for scatter plots) | |
| 146 -sqrt (output tag counts as sqrt(x+rand) values - for scatter plots) | |
| 147 -strand <+|-|both> (Count tags on specific strands relative to peak, default: both) | |
| 148 -pc <#> (maximum number of tags to count per bp, default=0 [no maximum]) | |
| 149 -cons (Retrieve conservation information for peaks/sites) | |
| 150 -CpG (Calculate CpG/GC content) | |
| 151 -ratio (process tag values as ratios - i.e. chip-seq, or mCpG/CpG) | |
| 152 -nfr (report nuclesome free region scores instead of tag counts, also -nfrSize <#>) | |
| 153 -norevopp (do not search for motifs on the opposite strand [works with -center too]) | |
| 154 -noadj (do not adjust the tag counts based on total tags sequenced) | |
| 155 -norm <#> (normalize tags to this tag count, default=1e7, 0=average tag count in all directories) | |
| 156 -pdist (only report distance to nearest peak using -p, not peak name) | |
| 157 -map <mapping file> (mapping between peak IDs and promoter IDs, overrides closest assignment) | |
| 158 -noann, -nogene (skip genome annotation step, skip TSS annotation) | |
| 159 -homer1/-homer2 (by default, the new version of homer [-homer2] is used for finding motifs) | |
| 160 | |
| 161 | |
| 162 </help> | |
| 163 </tool> | |
| 164 |
