annotate BEDTools-Version-2.14.3/RELEASE_HISTORY @ 0:dfcd8b6c1bda

Uploaded
author aaronquinlan
date Thu, 03 Nov 2011 10:25:04 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
1 Version 2.14.2 (2-Nov-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
2
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
3 Bug Fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
4 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
5 1. Corrected the help for closestBed. It now correctly reads -io instead of -no.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
6 2. Fixed regression in closestBed injected in version 2.13.4 whereby B features to the right of an A feature were missed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
7
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
8 New tool.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
9 ============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
10 1. Added the multiIntersectBed tool for reporting common intervals among multiple **sorted** BED/GFF/VCF files.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
11
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
12
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
13
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
14 Version 2.13.4 (26-Oct-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
15 Bug Fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
16 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
17 1. The -sorted option (chromsweep) in intersectBed now obeys -s and -S. I had neglected to implement that. Thanks to Paul Ryvkin for pointing this out.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
18 2. The -split option was mistakenly splitting of D CIGAR ops.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
19 3. The Makefile was not including zlib properly for newer versions of GCC. Thanks to Istvan Albert for pointing this out and providing the solution.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
20
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
21 Improvements
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
22 ============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
23 1. Thanks to Jacob Biesinger for a new option (-D) in closestBed that will report _signed_ distances. Moreover, the new option allows fine control over whether the distances are reported based on the reference genome or based on the strand of the A or B feature. Many thanks to Jacob.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
24 2. Thanks to some nice analysis from Paul Ryvkin, I realized that the -sorted option was using way too much memory in certain cases where there is a chromosome change in a sorted BED file. This has been corrected.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
25
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
26 Version 2.13.3 (30-Sept-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
27 Bug Fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
28 ============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
29 1. intersectBed detected, but did not report overlaps when using BAM input and -bed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
30
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
31 Other
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
32 =====
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
33 1. Warning that -sorted trusts, but does not enforce that data is actually sorted.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
34
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
35
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
36 Version 2.13.2 (23-Sept-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
37
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
38 New algorithm
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
39 =============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
40 1. Preliminary release of the chrom_sweep algorithm.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
41
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
42 New options
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
43 ===========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
44 1. genomeCoverageBed no longer requires a genome file when working with BAM input. It instead uses the BAM header.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
45 2. tagBam now has a -score option for annotating alignments with the BED "scores" field in annotation files. This overrides the default behavior, which is to use the -labels associated with the annotation files passed in on the command line.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
46
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
47 Bug fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
48 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
49 1. Correct a bug that prevented proper BAM support in intersectBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
50 2. Improved detection of GFF features with negative coordinates.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
51
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
52
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
53
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
54 Version 2.13.1 (6-Sept-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
55
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
56 New options
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
57 ===========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
58 1. tagBam now has -s and -S options for only annotating alignments with features on the same and opposite strand, respectively.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
59 2. tagBam now has a -names option for annotating alignments with the "name" field in annotation files. This overrides the default behavior, which is to use the -labels associated with the annotation files passed in on the command line. Currently, this works well with BED files, but given the limited metadata support for GFF files, annotating with -names and GFF files may not work as well as wished, depending on the type of GFF file used.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
60
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
61
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
62
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
63 Version 2.13.0 (1-Sept-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
64
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
65 New tools
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
66 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
67 1. tagBam. This tool annotates a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
68 For example:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
69 $ tagBam -i aln.bam -files exons.bed introns.bed cpg.bed utrs.bed \
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
70 -tags exonic intonic cpg utr \
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
71 > aln.tagged.bam
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
72 For alignments that have overlaps, you should see new BAM tags like "YB:Z:exonic", "YB:Z:cpg;utr"
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
73
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
74 2. multiBamCov. The new tool counts sequence coverage for multiple bams at specific loci defined in a BED/GFF/VCF file.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
75 For example:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
76
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
77 $ multiBamCov -bams aln.1.bam aln.2.bam aln3.bam -bed exons.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
78 chr1 861306 861409 SAMD11 1 + 181 280 236
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
79 chr1 865533 865718 SAMD11 2 + 249 365 374
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
80 chr1 866393 866496 SAMD11 3 + 162 298 322
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
81
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
82 where the last 3 columns represent the number of alignments overlapping each interval from the three BAM file.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
83
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
84 The following options are available to control which types of alignments are are counted.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
85 -q Minimum mapping quality allowed. Default is 0.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
86
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
87 -D Include duplicate-marked reads. Default is to count non-duplicates only
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
88
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
89 -F Include failed-QC reads. Default is to count pass-QC reads only
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
90
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
91 -p Only count proper pairs. Default is to count all alignments with MAPQ
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
92 greater than the -q argument, regardless of the BAM FLAG field.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
93
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
94 3. nucBed. This new tool profiles the nucleotide content of intervals in a fasta file. The following information will be reported after each original BED/GFF/VCF entry:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
95 1) %AT content
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
96 2) %GC content
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
97 3) Number of As observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
98 4) Number of Cs observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
99 5) Number of Gs observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
100 6) Number of Ts observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
101 7) Number of Ns observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
102 8) Number of other bases observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
103 9) The length of the explored sequence/interval.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
104 10) The sequence extracted from the FASTA file. (optional, if -seq is used)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
105 11) The number of times a user defined pattern was observed. (optional, if -pattern is used.)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
106
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
107
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
108
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
109 For example:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
110 $ nucBed -fi ~/data/genomes/hg18/hg18.fa -bed simrep.bed | head -3
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
111 #1_usercol 2_usercol 3_usercol 4_usercol 5_usercol 6_usercol 7_pct_at 8_pct_gc 9_num_A 10_num_C 11_num_G 12_num_T 13_num_N 14_num_oth 15_seq_len
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
112 chr1 10000 10468 trf 789 + 0.540598 0.459402 155 96 119 98 0 0 468
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
113 chr1 10627 10800 trf 346 + 0.445087 0.554913 54 55 41 23 0 0 173
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
114
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
115
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
116 One can also report the sequence itself:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
117 $ nucBed -fi ~/data/genomes/hg18/hg18.fa -bed simrep.bed -seq | head -3
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
118 #1_usercol 2_usercol 3_usercol 4_usercol 5_usercol 6_usercol 7_pct_at 8_pct_gc 9_num_A 10_num_C 11_num_G 12_num_T 13_num_N 14_num_oth 15_seq_len 16_seq
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
119 chr1 10000 10468 trf 789 + 0.540598 0.459402 155 96 119 98 0 0 468 ccagggg...
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
120 chr1 10627 10800 trf 346 + 0.445087 0.554913 54 55 41 23 0 0 173 TCTTTCA...
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
121
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
122 Or, one can count the number of times that a specific pattern occur in the intervals (reported as the last column):
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
123 $ nucBed -fi ~/data/genomes/hg18/hg18.fa -bed simrep.bed -pattern CGTT | head
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
124 #1_usercol 2_usercol 3_usercol 4_usercol 5_usercol 6_usercol 7_pct_at 8_pct_gc 9_num_A 10_num_C 11_num_G 12_num_T 13_num_N 14_num_oth 15_seq_len 16_user_patt_count
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
125 chr1 10000 10468 trf 789 + 0.540598 0.459402 155 96 119 98 0 0 468 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
126 chr1 10627 10800 trf 346 + 0.445087 0.554913 54 55 41 23 0 0 173 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
127 chr1 10757 10997 trf 434 + 0.370833 0.629167 49 70 81 40 0 0 240 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
128 chr1 11225 11447 trf 273 + 0.463964 0.536036 44 86 33 59 0 0 222 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
129 chr1 11271 11448 trf 187 + 0.463277 0.536723 37 69 26 45 0 0 177 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
130 chr1 11283 11448 trf 199 + 0.466667 0.533333 37 64 24 40 0 0 165 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
131 chr1 19305 19443 trf 242 + 0.282609 0.717391 17 57 42 22 0 0 138 1
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
132 chr1 20828 20863 trf 70 + 0.428571 0.571429 10 7 13 5 0 0 35 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
133 chr1 30862 30959 trf 79 + 0.556701 0.443299 35 22 21 19 0 0 97 0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
134
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
135
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
136
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
137 New options
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
138 ===========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
139 1. Support for named pipes and FIFOs.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
140 2. "-" is now allowable to indicate that data is being sent via stdin.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
141
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
142 3. Multiple tools. Added new -S option to annotateBed, closestBed, coverageBed, intersectBed, pairToBed, subtractBed, and windowBed (-Sm). This new option does the opposite of the -s option: that is, overlaps are only processed if they are on _opposite_ strands. Thanks to Sol Katzman for the great suggestion. Very useful for certain RNA-seq analyses.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
143
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
144 4. coverageBed. Added a new -counts option to coverageBed that only reports the count of overlaps, instead of also computing fractions, etc. This is much faster and uses much less memory.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
145
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
146 5. fastaFromBed. Added a new -full option that uses the full BED entry when naming each output sequence. Also removed the -fo option such that all output is now written to stdout.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
147
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
148 6. genomeCoverageBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
149 - Added new -scale option that allows the coverage values to be scaled by a constant. Useful for normalizing coverage with RPM, RPKM, etc. Thanks to Ryan Dale for the useful suggestion.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
150 - Added new -5, -3, -trackline, -trackopts, and -dz options. Many thanks to Assaf Gordon for these improvements.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
151 -5: Calculate coverage of 5" positions (instead of entire interval)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
152 -3: Calculate coverage of 3" positions (instead of entire interval).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
153 -trackline: Adds a UCSC/Genome-Browser track line definition in the first line of the output.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
154 -trackopts: rites additional track line definition parameters in the first line.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
155 -dz: Report the depth at each genome position with zero-based coordinates, instead of zero-based.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
156
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
157 7. closestBed. See below, thanks to Brent Pedersen, Assaf Gordon, Ryan Layer and Dan Webster for the helpful discussions.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
158 - closestBed now reports _all_ features in B that overlap A by default. This allows folks to decide which is the "best" overlapping feature on their own.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
159
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
160 2. closestBed now has a "-io" option that ignores overlapping features. In other words, it will only report the closest, non-overlapping feature.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
161
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
162 An example:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
163
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
164 $ cat a.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
165 chr1 10 20
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
166
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
167 $ cat b.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
168 chr1 15 16
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
169 chr1 16 40
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
170 chr1 100 1000
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
171 chr1 200 1000
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
172
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
173 $ bin/closestBed -a a.bed -b b.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
174 chr1 10 20 chr1 15 16
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
175 chr1 10 20 chr1 16 40
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
176
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
177 $ bin/closestBed -a a.bed -b b.bed -io
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
178 chr1 10 20 chr1 100 1000
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
179
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
180 Updates
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
181 =======
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
182 1. Updated to the latest version of BamTools. This allows greater functionality and will facilitate new options and tools in the future.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
183
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
184 -
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
185 Bug Fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
186 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
187 1. GFF files cannot have zero-length features.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
188 2. Corrected an erroneous check on the start coordinates in VCF files. Thanks to Jan Vogel for the correction.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
189 3. mergeBed now always reports output in BED format.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
190 3. Updated the text file Tokenizer function to yield 15% speed improvement.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
191 4. Various tweaks and improvements.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
192
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
193
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
194
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
195
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
196 Version 2.12.0 (April-3-2011)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
197
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
198 New Tool
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
199 ========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
200 1. Added new tool called "flankBed", which allows one to extract solely the flanking regions that are upstream and downstream of a given feature. Unlike slopBed, flankBed does not include the original feature itself. A new feature is created for each flabking region. For example, imagine the following feature:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
201
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
202 chr1 100 200
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
203
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
204 The following would create features for solely the 10 bp regions flanking this feature.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
205 $ bin/flankBed -i a.bed -b 10 -g genomes/human.hg18.genome
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
206 chr1 90 100
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
207 chr1 200 210
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
208
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
209 In contrast, slopBed would return:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
210 bin/slopBed -i a.bed -b 10 -g genomes/human.hg18.genome
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
211 chr1 90 210
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
212
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
213 FlankBed has all of the same features as slopBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
214
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
215
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
216 New Features
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
217 ============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
218 1. Added new "-scores" feature to mergeBed. This allows one to take the sum, min, max,
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
219 mean, median, mode, or antimode of merged feature scores. In addition, one can use the "collapse" operation to get a comma-separated list of the merged scores.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
220 2. mergeBed now tolerates multiple features in a merged block to have the same feature name.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
221 3. Thanks to Erik Garrison's "fastahack" library, fastaFromBed now reports its output in the order of the input file.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
222 4. Added a "-n" option to bed12ToBed6, which forces the score field to be the 1-based block number from the original BED12 feature. This is useful for tracking exon numbers, for example.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
223 5. Thanks to Can Alkan, added a new "-mc" option to maskFastaFromBed that allows one to define a custom mask character, such as "X" (-n X).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
224
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
225
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
226 Bug Fixes
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
227 =========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
228 1. Thanks to Davide Cittaro, intersectBed and windowBed now properly capture unmapped BAM alignments when using the "-v" option.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
229 2. ClosestBed now properly handles cases where b.end == a.start
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
230 3. Thanks to John Marshall, the default constructors are much safer and less buggy.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
231 4. Fixed bug in shuffleBed that complained about a lack of -incl and -excl.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
232 5. Fixed bug in shuffleBed for features that would go beyond the end of a chromosome.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
233 6. Tweaked bedToIgv to make it more Windows friendly.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
234
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
235
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
236
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
237 Version 2.11.2 (January-31-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
238 Fixed a coordinate reporting bug in coverageBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
239 Added "max distance (-d)" argument back to the new implementation of mergeBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
240
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
241
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
242
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
243 Version 2.11.0 (January-21-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
244
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
245 Enhancements:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
246 =============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
247 1. Support for zero length features (i.e., start = end)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
248 - For example, this allows overlaps to be detected with insertions in the reference genome, as reported by dbSNP.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
249 2. Both 8 and 9 column GFF files are now supported.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
250 3. slopBed can now extend the size of features by a percentage of it's size (-pct) instead of just a fixed number of bases.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
251 4. Two improvements to shuffleBed:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
252 3a. A -f (overlapFraction) parameter that defines the maximum overlap that a randomized feature can have with an -excl feature.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
253 That is, if a chosen locus has more than -f overlap with an -excl feature, a new locus is sought.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
254 3b. A new -incl option (thanks to Michael Hoffman and Davide Cittaro) that, defines intervals in which the randomized features should be placed. This is used instead of placing the features randomly in the genome. Note that a genome file is still required so
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
255 that a randomized feature does not go beyond the end of a chromosome.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
256 5. bamToBed can now optionally report the CIGAR string as an additional field.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
257 6. pairToPair can now report the entire paired feature from the B file when overlaps are found.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
258 7. complementBed now reports all chromosomes, not just those with features in the BED file.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
259 8. Improved randomization seeding in shuffleBed. This prevents identical output for runs of shuffleBed that
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
260 occur in the same second (often the case).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
261
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
262
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
263 Bug Fixes:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
264 ==========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
265 1. Fixed the "BamAlignmentSupportData is private" compilation issue.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
266 2. Fixed a bug in windowBed that caused positions to run off the end of a chromosome.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
267
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
268
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
269 Major Changes:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
270 ==============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
271 1. The groupBy command is now part of the filo package (https://github.com/arq5x/filo) and will no longer be distributed with BEDTools.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
272
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
273
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
274
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
275 Version 2.10.0 (September-21-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
276 ==New tools==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
277 1. annotateBed. Annotates one BED/VCF/GFF file with the coverage and number of overlaps observed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
278 from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature coincides with multiple other feature types with a single command. For example, the following will annotate the fraction of the variants in variants.bed that are covered by genes, conservaed regions and know variation, respectively.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
279 $ annotateBed -i variants.bed -files genes.bed conserv.bed known_var.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
280
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
281 This tool was suggested by Can Alkan and was motivated by the example source code that he kindly provided.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
282
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
283 ==New features==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
284 1. New frequency operations (freqasc and freqdesc) added to groupBy. These operations report a histogram of the frequency that each value is observed in a given column.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
285 2. Support for writing uncompressed bam with the -ubam option.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
286 3. Shorthand arguments for groupBy (-g eq. -grp, -c eq. -opCols, -o eq. -opCols).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
287 4. In addition, all BEDTools that require only one main input file (the -i file) will assume that input is
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
288 coming from standard input if the -i parameter is ignored. For example, the following are equivalent:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
289 $ cat snps.bed | sortBed –i stdin
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
290 $ cat snps.bed | sortBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
291
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
292 As are these:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
293 $ cat data.txt | groupBy -i stdin -g 1,2,3 -c 5 -o mean
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
294 $ cat data.txt | groupBy -g 1,2,3 -c 5 -o mean
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
295
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
296 ==Bug fixes==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
297 1. Increased the precision of the output from groupBy.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
298
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
299
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
300
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
301 Version 2.9.0 (August-16-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
302 ==New tools==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
303 1. unionBedGraphs. This is a very powerful new tool contributed by Assaf Gordon from CSHL. It will combine/merge multiple BEDGRAPH files into a single file, thus allowing comparisons of coverage (or any text-value) across multiple samples.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
304
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
305 ==New features==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
306 1. New "distance feature" (-d) added to closestBed by Erik Arner. In addition to finding the closest feature to each feature in A, the -d option will report the distance to the closest feature in B. Overlapping features have a distance of 0.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
307 2. New "per base depth feature" (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
308
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
309
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
310 ==Bug Fixes==
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
311 1. Fixed bug in closestBed preventing closest features from being found for A features with start coordinates < 2048000. Thanks to Erik Arner for pointing this out.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
312 2. Fixed minor reporting annoyances in closestBed. Thanks to Erik Arner.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
313 3. Fixed typo/bug in genomeCoverageBed that reported negative coverage owing to numeric overflow. Thanks to Alexander Dobin for the detailed bug report.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
314 4. Fixed other minor parsing and reporting bugs/annoyances.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
315
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
316
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
317
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
318
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
319 Version 2.8.3 (July-25-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
320 1. Fixed bug that caused some GFF files to be misinterpreted as VCF. This prevented the detection of overlaps.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
321 2. Added a new "-tag" option in bamToBed that allows one to choose the _numeric_ tag that will be used to populate the score field. For example, one could populate the score field with the alignment score with "-tag AS".
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
322 3. Updated the BamTools API.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
323
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
324
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
325 Version 2.8.2 (July-18-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
326 1. Fixed a bug in bedFile.h preventing GFF strands from being read properly.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
327 2. Fixed a bug in intersectBed that occasionally caused spurious overlaps between BAM alignments and BED features.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
328 3. Fixed bug in intersectBed causing -r to not report the same result when files are swapped.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
329 4. Added checks to groupBy to prevent the selection of improper opCols and groups.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
330 5. Fixed various compilation issues, esp. for groupBy, bedToBam, and bedToIgv.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
331 6. Updated the usage statements to reflect bed/gff/vcf support.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
332 7. Added new fileType functions for auto-detecting gzipped or regular files. Thanks to Assaf Gordon.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
333
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
334
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
335 Version 2.8.1 (July-05-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
336 1. Added bedToIgv.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
337
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
338
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
339 Version 2.8.0 (July-04-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
340
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
341 1. Proper support for "split" BAM alignments and "blocked" BED (aka BED12) features. By using the "-split" option, intersectBed, coverageBed, genomeCoverageBed, and bamToBed will now correctly compute overlaps/coverage solely for the "split" portions of BAM alignments or the "blocks" of BED12 features such as genes.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
342
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
343 2. Added native support for the 1000 Genome Variant Calling Format (VCF) version 4.0.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
344
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
345 3. New bed12ToBed6 tool. This tool will convert each block of a BED12 feature into discrete BED6 features.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
346
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
347 4. Useful new groupBy tool. This is a very useful new tool that mimics the "groupBy" clause in SQL. Given a file or stream that is sorted by the appropriate "grouping columns", groupBy will compute summary statistics on another column in the file or stream. This will work with output from all BEDTools as well as any other tab-delimited file or stream. Example summary operations include: sum, mean, stdev, min, max, etc. Please see the help for the tools for examples. The functionality in groupBy was motivated by helpful discussions with Erik Arner at Riken.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
348
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
349 5. Improvements to genomeCoverageBed. Applied several code improvements provided by Gordon Assaf at CSHL. Most notably, beyond the several efficiency and organizational changes he made, he include a "-strand" option which allows one to specify that coverage should only be computed on either the "+" or the "-" strand.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
350
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
351 6. Fixed a bug in closestBed found by Erik Arner (Riken) which incorrectly reported "null" overlaps for features that did not have a closest feature in the B file.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
352
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
353 7. Fixed a careless bug in slopBed also found by Erik Arner (Riken) that caused an infinite loop when the "-excl" option was used.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
354
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
355 8. Reduced memory consumption by ca. 15% and run time by ca. 10% for most tools.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
356
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
357 9. Several code-cleanliness updates such as templated functions and common tyedefs.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
358
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
359 10. Tweaked the genome binning approach such that 16kb bins are the most granular.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
360
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
361
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
362 Version 2.7.1 (May-06-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
363 Fixed a typo that caused some compilers to fail on closestBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
364
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
365 Version 2.7.0 (May-05-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
366
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
367 General:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
368 1. "Gzipped" BED and GFF files are now supported as input by all BEDTools. Such files must end in ".gz".
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
369 2. Tools that process BAM alignments now uniformly compute an ungapped alignment end position based on the BAM CIGAR string. Specifically, "M", "D" and "N" operations are observed when computing the end position.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
370 3. bamToBed requires the BAM file to be sorted/grouped by read id when creating BEDPE output. This allows the alignments end coordinate for each end of the pair to be properly computed based on its CIGAR string. The same requirement applies to pairToBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
371 4. Updated manual.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
372 5. Many silent modifications to the code that improve clarity and sanity-checking and facilitate future additions/modifications.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
373
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
374
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
375 New Tools:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
376 1. bedToBam. This utility will convert BED files to BAM format. Both "blocked" (aka BED12) and "unblocked" (e.g. BED6) formats are acceptable. This allows one to, for example, compress large BED files such as dbSNP into BAM format for efficient visualization.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
377
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
378
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
379 Changes to existing tools:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
380 intersectBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
381 1. Added -wao option to report 0 overlap for features in A that do not intersect any features in B. This is an extension of the -wo option.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
382
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
383 bamToBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
384 1. Requires that BAM input be sorted/grouped by read name.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
385
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
386 pairToBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
387 1. Requires that BAM input be sorted/grouped by read name.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
388 2. Allows use of minimum mapping quality or total edit distance for score field.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
389
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
390 windowBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
391 1. Now supports BAM input.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
392
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
393 genomeCoverageBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
394 1. -bga option. Thanks to Gordon Assaf for the suggestion.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
395 2. Eliminated potential seg fault.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
396
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
397 Acknowledgements:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
398 1. Gordon Assaf: for suggesting the -bga option in genomeCoverageBed and for testing the new bedToBam utility.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
399 2. Ivan Gregoretti: for helping to expedite the inclusion of gzip support.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
400 3. Can Alkan: for suggesting the addition of the -wao option to intersectBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
401 4. James Ward: for pointing out that bedToBam did not need to create "dummy" seq and qual entries.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
402
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
403
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
404
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
405 Version 2.6.1 (Mar-29-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
406 1. Fixed a careless command line parsing bug in coverageBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
407
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
408
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
409 Version 2.6.0 (Mar-23-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
410 ***Specific improvements / additions to tools***
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
411 1. intersectBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
412 * Added an option (-wo) that reports the number of overlapping bases for each intersection b/w A and B files.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
413 -- Not sure why this wasn't added sooner; it's obvious.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
414
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
415 2. coverageBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
416 * native BAM support
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
417 * can now report a histogram (-hist) of coverage for each feature in B. Useful for exome sequencing projects, for example.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
418 -- thanks for the excellent suggestion from Jose Bras
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
419 * faster
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
420
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
421 3. genomeCoverageBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
422 * native BAM support
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
423 * can now report coverage in BEDGRAPH format (-bg)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
424 -- thanks for the code and great suggestion from Gordon Assaf, CSHL.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
425
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
426 4. bamToBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
427 * support for "blocked" BED (aka BED12) format. This facilitates the creation of BED entries for "split" alignments (e.g. RNAseq or SV)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
428 -- thanks to Ann Loraine, UNCC for test data to support this addition.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
429
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
430 5. fastaFromBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
431 * added the ability to extract sequences from a FASTA file according to the strand in the BED file. That is, when "-" the extracted sequence is reverse complemented.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
432 -- thanks to Thomas Doktor, U. of Southern Denmark for the code and suggestion.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
433
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
434 6. ***NEW*** overlap
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
435 * newly added tool for computing the overlap/distance between features on the same line.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
436 -- For example:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
437 $ cat test.out
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
438 chr1 10 20 A chr1 15 25 B
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
439 chr1 10 20 C chr1 25 35 D
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
440
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
441 $ cat test.out | overlaps -i stdin -cols 2,3,6,7
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
442 chr1 10 20 A chr1 15 25 B 5
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
443 chr1 10 20 C chr1 25 35 D -5
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
444
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
445 ***Bug fixes***
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
446 1. Fixed a bug in pairToBed when comparing paired-end BAM alignments to BED annotations and using the "notboth" option.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
447 2. Fixed an idiotic bug in intersectBed that occasionally caused segfaults when blank lines existed in BED files.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
448 3. Fixed a minor bug in mergeBed when using the -nms option.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
449
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
450 ***General changes***
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
451 1. Added a proper class for genomeFiles. The code is much cleaner and the tools are less sensitive to minor problems with the formatting of genome files. Per Gordon Assaf's wise suggestion, the tools now support "chromInfo" files directly downloaded from UCSC. Thanks Gordon---I disagreed at first, but you were right.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
452 2. Cleaned up some of the code and made the API a bit more streamlined. Will facilitate future tool development, etc.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
453
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
454
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
455 Version 2.5.4 (Mar-3-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
456 1. Fixed an insidious bug that caused malform BAM output from intersectBed and pairToBed. The previous BAM files worked fine with samtools as BAM input, but when piped in as SAM, there was an extra tab that thwarted conversion from SAM back to BAM. Many thanks to Ivan Gregoretti for reporting this bug. I had never used the BAM output in this way and thus never caught the bug!
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
457
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
458
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
459 Version 2.5.3 (Feb-19-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
460 1. Fixed bug to "re-allow" track and "browser" lines.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
461 2. Fixed bug in reporting BEDPE overlaps.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
462 3. Fixed bug when using type "notboth" with BAM files in pairToBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
463 4. When comparing BAM files to BED/GFF annotations with intersectBed or pairToBed, the __aligned__ sequence is used, rather than the __original__ sequence.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
464 5. Greatly increased the speed of pairToBed when using BAM alignments.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
465 6. Fixed a bug in bamToBed when reporting edit distance from certain aligners.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
466
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
467
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
468 Version 2.5.2 (Feb-2-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
469 1. The start and end coordinates for BED and BEDPE entries created by bamToBed are now based on the __aligned__ sequence, rather than the original sequence. It's obvious, but I missed it originally...sorry.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
470 2. Added an error message to mergeBed preventing one from using "-n" and "-nms" together.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
471 3. Fixed a bug in pairToBed that caused neither -type "notispan" nor "notospan" to behave as described.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
472
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
473
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
474 Version 2.5.1 (Jan-28-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
475 1. Fixed a bug in the new GFF/BED determinator that caused a segfault when start = 0.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
476
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
477
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
478 Version 2.5.0 (Jan-27-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
479 1. Added support for custom BED fields after the 6th column.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
480 2. Fixed a command line parsing bug in pairToBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
481 3. Improved sanity checking.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
482
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
483
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
484 Version 2.4.2 (Jan-23-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
485 1. Fixed a minor bug in mergeBed when -nms and -s were used together.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
486 2. Improved the command line parsing to prevent the occasional segfault.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
487
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
488
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
489 Version 2.4.1 (Jan-12-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
490 1. Updated BamTools libraries to remove some compilation issues on some systems/compilers.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
491
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
492
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
493 Version 2.4.0 (Jan-11-2010)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
494 1. Added BAM support to intersectBed and pairToBed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
495 2. New bamToBed feature.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
496 3. Added support for GFF features
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
497 4. Added support for "blocked" BED format (BED12)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
498 6. Wrote complete manual and included it in distribution.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
499 7. Fixed several minor bugs.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
500 8. Cleaned up code and improved documentation.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
501
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
502
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
503 Version 2.3.3 (12/17/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
504 Rewrote complementBed to use a slower but much simpler approach. This resolves several bugs with the previous logic.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
505
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
506
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
507 Version 2.3.2 (11/25/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
508 Fixed a bug in subtractBed that prevent a file from subtracting itself when the following is used:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
509 $ subtractBed -a test.bed -b test.bed
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
510
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
511
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
512 Version 2.3.1 (11/19/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
513 Fixed a typo in closestBed that caused all nearby features to be returned instead of just the closest one.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
514
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
515
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
516 Version 2.3.0 (11/18/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
517 1. Added four new tools:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
518 - shuffleBed. Randomly permutes the locations of a BED file among a genome. Useful for testing for significant overlap enrichments.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
519 - slopBed. Adds a requested number of base pairs to each end of a BED feature. Constrained by the size of each chromosome.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
520 - maskFastaFromBed. Masks a FASTA file based on BED coordinates. Useful making custom genome files from targeted capture experiment, etc.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
521 - pairToPair. Returns overlaps between two paired-end BED files. This is great for finding structural variants that are private or shared among samples.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
522
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
523 2. Increased the speed of intersectBed by nearly 50%.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
524 3. Improved corrected some of the help messages.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
525 4. Improved sanity checking for BED entries.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
526
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
527
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
528 Version 2.2.4 (10/27/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
529 1. Updated the mergeBed documentation to describe the -names option which allows one to report the names of the
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
530 features that were merged (separated by semicolons).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
531
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
532
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
533 Version 2.2.3 (10/23/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
534 1. Changed windowBed to optionally define "left" and "right" windows based on strand. For example by default, -l 100 and -r 500 will
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
535 add 100 bases to the left (lower coordinates) of a feature in A when scanning for hits in B and 500 bases to the right (higher coordinates).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
536
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
537 However if one chooses the -sw option (windows bases on strandedness), the behavior changes. Assume the above example except that a feature in A
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
538 is on the negative strand ("-"). In this case, -l 100, -r 500 and -sw will add 100 bases to the right (higher coordinates) and 500 bases to the left (lower coordinates).
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
539
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
540 In addition, there is a separate option (-sm) that can optionally force hits in B to only be tracked if they are on the same strand as A.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
541
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
542 ***NOTE: This replaces the previous -s option and may affect existing pipelines***.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
543
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
544
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
545 Version 2.2.2 (10/20/2009)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
546 1. Improved the speed of genomeCoverageBed by roughly 100 fold. The memory usage is now less than 2.0 Gb.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
547
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
548
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
549 Version 2.2.1
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
550 1. Fixed a very obvious bug in subtractBed that caused improper behavior when a feature in A was overlapped by more than one feature in B.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
551 Many thanks to folks in the Hannon lab at CSHL for pointing this out.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
552
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
553
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
554 Version 2.2.0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
555 === Notable changes in this release ===
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
556 1. coverageBed will optionally only count features in BED file A (e.g. sequencing reads) that overlap with
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
557 the intervals/windows in BED file B on the same strand. This has been requested several times recently
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
558 and facilitates CHiP-Seq and RNA-Seq experiments.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
559
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
560 2. intersectBed can now require a minimum __reciprocal__ overlap between intervals in BED A and BED B. For example,
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
561 previously, if one used -f 0.90, it required that a feature in B overlap 90% of the feature in A for the "hit"
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
562 to be reported. If one adds the -r (reciprocal) option, the hit must also cover 90% of the feature in B. This helps
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
563 to exclude overlaps between say small features in A and large features in B:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
564
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
565 A ==========
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
566 B **********************************************************
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
567
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
568 -f 0.50 (Reported), whereas -f 0.50 -r (Not reported)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
569
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
570 3. The score field has been changed to be a string. While this deviates from the UCSC definition, it allows one to track
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
571 much more meaningful information about a feature/interval. For example, score could now be:
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
572
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
573 7.31E-05 (a p-value)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
574 0.334577 (mean enrichment)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
575 2:2.2:40:2 (several values encoded in a string)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
576
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
577 4. closestBed now, by default, reports __all__ intervals in B that overlap equally with an interval in A. Previously, it
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
578 merely reported the first such feature that appeared in B. Here's a cartoon explaining the difference.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
579
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
580 **Prior behavior**
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
581
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
582 A ==============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
583 B.1 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
584 B.2 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
585 B.3 +++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
586
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
587 -----------------------------------------
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
588 Result = B.1 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
589
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
590
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
591 **Current behavior**
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
592
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
593 A ==============
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
594 B.1 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
595 B.2 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
596 B.3 +++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
597
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
598 -----------------------------------------
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
599 Result = B.1 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
600 B.2 ++++++++++++++
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
601
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
602 Using the -t option, one can also choose to report either the first or the last entry in B in the event of a tie.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
603
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
604 5. Several other minor changes to the algorithms have been made to increase speed a bit.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
605
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
606
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
607 VERSION 2.1.2
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
608 1. Fixed yet another bug in the parsing of "track" or "browser" lines. Sigh...
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
609 2. Change the "score" column (i.e. column 5) to b stored as a string. While this deviates
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
610 from the UCSC convention, it allows significantly more information to be packed into the column.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
611
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
612
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
613 VERSION 2.1.1
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
614 1. Added limits.h to bedFile.h to fix compilation issues on some systems.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
615 2. Fixed bug in testing for "track" or "browser" lines.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
616
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
617
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
618 VERSION 2.1.0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
619 1. Fixed a bug in peIntersectBed that prevented -a from being correctly handled when passed via stdin.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
620 2. Added new functionality to coverageBed that calculates the density of coverage.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
621 3. Fixed bug in geneomCoverageBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
622
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
623
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
624 VERSION 2.0.1
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
625 1. Added the ability to retain UCSC browser track/browser headers in BED files.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
626
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
627
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
628 VERSION 2.0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
629 1. Sped up the file parsing. ~10-20% increase in speed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
630 2. Created reportBed() as a common method in the bedFile class. Cleans up the code quite nicely.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
631 3. Added the ability to compare BED files accounting for strandedness.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
632 4. Paired-end intersect.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
633 5. Fixed bug that prevented overlaps from being reported when the overlap fraction requested is 1.0
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
634
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
635
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
636
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
637 VERSION 1.2, 04/27/2009. (1eb06115bdf3c49e75793f764a70c3501bb53f33)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
638 1. Added subtractBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
639 A. Fixed bug that prevented "split" overlaps from being reported.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
640 B. Prevented A from being reported if >=1 feature in B completely spans it.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
641 2. Added linksBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
642 3. Added the ability to define separate windows for upstream and downstream to windowBed.
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
643
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
644
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
645 VERSION 1.1, 04/23/2009. (b74eb1afddca9b70bfa90ba763d4f2981a56f432)
dfcd8b6c1bda Uploaded
aaronquinlan
parents:
diff changeset
646 Initial release.