annotate barcode_collapse_pe.py @ 0:38a8018e57a5 draft default tip

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
author bgruening
date Sat, 11 Jun 2016 10:46:34 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
1 __author__ = 'gpratt'
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
2 """
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
3
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
4 barcode_collapse.py read in a .bam file where the
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
5 first 9 nt of the read name
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
6 are the barcode and merge reads mapped to the same position that have the same barcode
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
7
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
8 """
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
9
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
10 from collections import Counter
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
11 import itertools
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
12 from optparse import OptionParser
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
13 import sys
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
14 import pysam
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
15
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
16
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
17 def stranded_read_start(read):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
18 if read.is_reverse:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
19 return read.positions[-1]
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
20 else:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
21 return read.pos
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
22
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
23
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
24 def output_metrics(metrics_file, total_count, removed_count):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
25 with open(metrics_file, 'w') as metrics:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
26 metrics.write("\t".join(["randomer", "total_count", "removed_count"]) + "\n")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
27 for barcode in total_count.keys():
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
28 metrics.write("\t".join(map(str, [barcode, total_count[barcode], removed_count[barcode]])) + "\n")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
29
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
30
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
31 def barcode_collapse(in_bam, out_bam):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
32 number_of_unmapped_mate_pairs = 0
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
33 different_chroms = 0
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
34 removed_count = Counter()
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
35 total_count = Counter()
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
36 result_dict = {}
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
37
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
38 with pysam.Samfile(in_bam, 'r') as samfile1:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
39 with pysam.Samfile(in_bam, 'r') as samfile2:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
40 samfile_read1 = itertools.islice(samfile1, 0, None, 2)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
41 samfile_read2 = itertools.islice(samfile2, 1, None, 2)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
42 for read1, read2 in itertools.izip(samfile_read1, samfile_read2):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
43 if not read1.qname == read2.qname:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
44 print read1.qname, read2.qname
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
45 raise Exception("Read Names don't match")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
46 if read1.is_unmapped and read1.is_unmapped:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
47 #Both reads don't map, don't even both saving them.
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
48 continue
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
49 if (not read1.is_unmapped and read2.is_unmapped) or (read1.is_unmapped and read2.is_unmapped):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
50 number_of_unmapped_mate_pairs += 1
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
51 continue
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
52 if read1.rname != read2.rname:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
53 different_chroms += 1
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
54 continue
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
55 randomer = read1.qname.split(":")[0]
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
56
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
57 start = stranded_read_start(read1)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
58 stop = stranded_read_start(read2)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
59 strand = "-" if read1.is_reverse else "+"
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
60 unique_location = (read1.rname, start, stop, strand, randomer)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
61 total_count[randomer] += 1
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
62 if unique_location in result_dict:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
63 removed_count[randomer] += 1
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
64 continue
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
65
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
66 result_dict[(read1.rname, start, stop, strand, randomer)] = (read1, read2)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
67
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
68 with pysam.Samfile(out_bam, 'wb', template=samfile1) as out_bam:
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
69 for key, (read1, read2) in result_dict.items():
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
70 out_bam.write(read1)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
71 out_bam.write(read2)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
72 return total_count, removed_count
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
73
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
74 if __name__ == "__main__":
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
75 description=""""Paired End randomer aware duplciate removal algorithm."""
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
76 usage="""Assumes paired end reads are adjectent to each other in output file (ie only provide unsorted bams)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
77 Also assumes no multimappers in the bam file, if there are multimappers behavior is undefined"""
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
78 parser = OptionParser(usage=usage, description=description)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
79 parser.add_option("-b", "--bam", dest="bam", help="bam file to barcode collapse")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
80 parser.add_option("-o", "--out_file", dest="out_file")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
81 parser.add_option("-m", "--metrics_file", dest="metrics_file")
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
82 (options, args) = parser.parse_args()
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
83
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
84 if not (options.bam.endswith(".bam")):
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
85 raise TypeError("%s, not bam file" % options.bam)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
86
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
87 total_count, removed_count = barcode_collapse(options.bam, options.out_file)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
88 output_metrics(options.metrics_file, total_count, removed_count)
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
89
38a8018e57a5 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/barcode_collapse commit cb3ef4df8ea5a705039b32f6959d7513462cfcfc
bgruening
parents:
diff changeset
90 sys.exit(0)