Mercurial > repos > jjohnson > defuse
annotate defuse_results_to_vcf.py @ 25:2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
author | Jim Johnson <jj@umn.edu> |
---|---|
date | Fri, 09 Aug 2013 11:19:26 -0500 |
parents | |
children | d57fcac025e2 |
rev | line source |
---|---|
25
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
2 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
3 # |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
4 #------------------------------------------------------------------------------ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
5 # University of Minnesota |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
6 # Copyright 2012, Regents of the University of Minnesota |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
7 #------------------------------------------------------------------------------ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
8 # Author: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
9 # |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
10 # James E Johnson |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
11 # Jesse Erdmann |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
12 # |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
13 #------------------------------------------------------------------------------ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
14 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
15 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
16 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
17 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
18 This tool takes the defuse results.tsv tab-delimited file as input and creates a Variant Call Format file as output. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
19 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
20 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
21 import sys,re,os.path |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
22 import optparse |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
23 from optparse import OptionParser |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
24 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
25 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
26 http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
27 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
28 5. INFO keys used for structural variants |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
29 When the INFO keys reserved for encoding structural variants are used for imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles). |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
30 The following INFO keys are reserved for encoding structural variants. In general, when these keys are used by imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles). |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
31 ##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
32 ##INFO=<ID=NOVEL,Number=0,Type=Flag,Description="Indicates a novel structural variation"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
33 ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
34 For precise variants, END is POS + length of REF allele - 1, and the for imprecise variants the corresponding best estimate. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
35 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
36 Value should be one of DEL, INS, DUP, INV, CNV, BND. This key can be derived from the REF/ALT fields but is useful for filtering. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
37 ##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
38 One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g. deletions) have negative values. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
39 ##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
40 ##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
41 ##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
42 ##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
43 ##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
44 For precise variants, the consensus sequence the alternate allele assembly is derivable from the REF and ALT fields. However, the alternate allele assembly file may contain additional information about the characteristics of the alt allele contigs. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
45 ##INFO=<ID=MEINFO,Number=4,Type=String,Description="Mobile element info of the form NAME,START,END,POLARITY"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
46 ##INFO=<ID=METRANS,Number=4,Type=String,Description="Mobile element transduction info of the form CHR,START,END,POLARITY"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
47 ##INFO=<ID=DGVID,Number=1,Type=String,Description="ID of this element in Database of Genomic Variation"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
48 ##INFO=<ID=DBVARID,Number=1,Type=String,Description="ID of this element in DBVAR"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
49 ##INFO=<ID=DBRIPID,Number=1,Type=String,Description="ID of this element in DBRIP"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
50 ##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
51 ##INFO=<ID=PARID,Number=1,Type=String,Description="ID of partner breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
52 ##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
53 ##INFO=<ID=CILEN,Number=2,Type=Integer,Description="Confidence interval around the length of the inserted material between breakends"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
54 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Read Depth of segment containing breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
55 ##INFO=<ID=DPADJ,Number=.,Type=Integer,Description="Read Depth of adjacency"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
56 ##INFO=<ID=CN,Number=1,Type=Integer,Description="Copy number of segment containing breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
57 ##INFO=<ID=CNADJ,Number=.,Type=Integer,Description="Copy number of adjacency"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
58 ##INFO=<ID=CICN,Number=2,Type=Integer,Description="Confidence interval around copy number for the segment"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
59 ##INFO=<ID=CICNADJ,Number=.,Type=Integer,Description="Confidence interval around copy number for the adjacency"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
60 6. FORMAT keys used for structural variants |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
61 ##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
62 ##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
63 ##FORMAT=<ID=CNL,Number=.,Type=Float,Description="Copy number genotype likelihood for imprecise events"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
64 ##FORMAT=<ID=NQ,Number=1,Type=Integer,Description="Phred style probability score that the variant is novel with respect to the genome's ancestor"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
65 ##FORMAT=<ID=HAP,Number=1,Type=Integer,Description="Unique haplotype identifier"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
66 ##FORMAT=<ID=AHAP,Number=1,Type=Integer,Description="Unique identifier of ancestral haplotype"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
67 These keys are analogous to GT/GQ/GL and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined). CN specifies the integer copy number of the variant in this sample. CNQ is encoded as a phred quality -10log_10p(copy number genotype call is wrong). CNL specifies a list of log10 likelihoods for each potential copy number, starting from zero. When possible, GT/GQ/GL should be used instead of (or in addition to) these keys. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
68 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
69 Specifying Complex Rearrangements with Breakends |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
70 An arbitrary rearrangement event can be summarized as a set of novel adjacencies. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
71 Each adjacency ties together 2 breakends. The two breakends at either end of a novel adjacency are called mates. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
72 There is one line of VCF (i.e. one record) for each of the two breakends in a novel adjacency. A breakend record is identified with the tag SYTYPE=BND" in the INFO field. The REF field of a breakend record indicates a base or sequence s of bases beginning at position POS, as in all VCF records. The ALT field of a breakend record indicates a replacement for s. This "breakend replacement" has three parts: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
73 the string t that replaces places s. The string t may be an extended version of s if some novel bases are inserted during the formation of the novel adjacency. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
74 The position p of the mate breakend, indicated by a string of the form "chr:pos". This is the location of the first mapped base in the piece being joined at this novel adjacency. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
75 The direction that the joined sequence continues in, starting from p. This is indicated by the orientation of square brackets surrounding p. |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
76 These 3 elements are combined in 4 possible ways to create the ALT. In each of the 4 cases, the assertion is that s is replaced with t, and then some piece starting at position p is joined to t. The cases are: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
77 REF ALT Meaning |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
78 s t[p[ piece extending to the right of p is joined after t |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
79 s t]p] reverse comp piece extending left of p is joined after t |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
80 s ]p]t piece extending to the left of p is joined before t |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
81 s [p[t reverse comp piece extending right of p is joined before t |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
82 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
83 Examples: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
84 #CHROM POS ID REF ALT QUAL FILT INFO |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
85 2 321681 bnd_W G G]17:198982] 6 PASS SVTYPE=BND;MATEID=bnd_Y |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
86 2 321682 bnd_V T ]13:123456]T 6 PASS SVTYPE=BND;MATEID=bnd_U |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
87 13 123456 bnd_U C C[2:321682[ 6 PASS SVTYPE=BND;MATEID=bnd_V |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
88 13 123457 bnd_X A [17:198983[A 6 PASS SVTYPE=BND;MATEID=bnd_Z |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
89 17 198982 bnd_Y A A]2:321681] 6 PASS SVTYPE=BND;MATEID=bnd_W |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
90 17 198983 bnd_Z C [13:123457[C 6 PASS SVTYPE=BND;MATEID=bnd_X |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
91 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
92 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
93 vcf_header = """\ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
94 ##fileformat=VCFv4.1 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
95 ##source=defuse |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
96 ##reference=%s |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
97 ##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
98 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
99 ##INFO=<ID=MATEID,Number=1,Type=String,Description="ID of the BND mate"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
100 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Read Depth of segment containing breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
101 ##INFO=<ID=SPLITCNT,Number=1,Type=Integer,Description="number of split reads supporting the prediction"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
102 ##INFO=<ID=SPANCNT,Number=1,Type=Integer,Description="number of spanning reads supporting the fusion"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
103 ##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
104 ##INFO=<ID=SPLICESCORE,Number=1,Type=Integer,Description="number of nucleotides similar to GTAG at fusion splice"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
105 ##INFO=<ID=GENE,Number=2,Type=String,Description="Gene Names at each breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
106 ##INFO=<ID=GENEID,Number=2,Type=String,Description="Gene IDs at each breakend"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
107 ##INFO=<ID=ORF,Number=0,Type=Flag,Description="fusion combines genes in a way that preserves a reading frame"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
108 ##INFO=<ID=EXONBND,Number=0,Type=Flag,Description="fusion splice at exon boundaries"> |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
109 #CHROM POS ID REF ALT QUAL FILTER INFO\ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
110 """ |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
111 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
112 def cmp_alphanumeric(s1,s2): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
113 if s1 == s2: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
114 return 0 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
115 a1 = re.findall("\d+|[a-zA-Z]+",s1) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
116 a2 = re.findall("\d+|[a-zA-Z]+",s2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
117 for i in range(min(len(a1),len(a2))): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
118 if a1[i] == a2[i]: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
119 continue |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
120 if a1[i].isdigit() and a2[i].isdigit(): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
121 return int(a1[i]) - int(a2[i]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
122 return 1 if a1[i] > a2[i] else -1 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
123 return len(a1) - len(a2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
124 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
125 def __main__(): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
126 # VCF functions |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
127 chr_dict = dict() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
128 def add_vcf_line(chr,pos,id,line): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
129 if chr not in chr_dict: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
130 pos_dict = dict() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
131 chr_dict[chr] = pos_dict |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
132 if pos not in chr_dict[chr]: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
133 id_dict = dict() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
134 chr_dict[chr][pos] = id_dict |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
135 chr_dict[chr][pos][id] = line |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
136 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
137 def write_vcf(): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
138 print >> outputFile, vcf_header % (refname) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
139 for chr in sorted(chr_dict.keys(),cmp=cmp_alphanumeric): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
140 for pos in sorted(chr_dict[chr].keys()): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
141 for id in chr_dict[chr][pos]: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
142 print >> outputFile, chr_dict[chr][pos][id] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
143 #Parse Command Line |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
144 parser = optparse.OptionParser() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
145 # files |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
146 parser.add_option( '-i', '--input', dest='input', help='The input defuse results.tsv file (else read from stdin)' ) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
147 parser.add_option( '-o', '--output', dest='output', help='The output vcf file (else write to stdout)' ) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
148 parser.add_option( '-r', '--reference', dest='reference', default=None, help='The genomic reference id' ) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
149 (options, args) = parser.parse_args() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
150 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
151 # results.tsv input |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
152 if options.input != None: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
153 try: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
154 inputPath = os.path.abspath(options.input) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
155 inputFile = open(inputPath, 'r') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
156 except Exception, e: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
157 print >> sys.stderr, "failed: %s" % e |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
158 exit(2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
159 else: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
160 inputFile = sys.stdin |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
161 # vcf output |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
162 if options.output != None: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
163 try: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
164 outputPath = os.path.abspath(options.output) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
165 outputFile = open(outputPath, 'w') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
166 except Exception, e: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
167 print >> sys.stderr, "failed: %s" % e |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
168 exit(3) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
169 else: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
170 outputFile = sys.stdout |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
171 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
172 refname = options.reference if options.reference else 'unknown' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
173 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
174 svtype = 'SVTYPE=BND' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
175 filt = 'PASS' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
176 columns = [] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
177 try: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
178 for linenum,line in enumerate(inputFile): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
179 ## print >> sys.stderr, "%d: %s\n" % (linenum,line) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
180 fields = line.strip().split('\t') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
181 if line.startswith('cluster_id'): |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
182 columns = fields |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
183 ## print >> sys.stderr, "columns: %s\n" % columns |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
184 continue |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
185 cluster_id = fields[columns.index('cluster_id')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
186 gene_chromosome1 = fields[columns.index('gene_chromosome1')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
187 gene_chromosome2 = fields[columns.index('gene_chromosome2')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
188 genomic_strand1 = fields[columns.index('genomic_strand1')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
189 genomic_strand2 = fields[columns.index('genomic_strand2')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
190 gene1 = fields[columns.index('gene1')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
191 gene2 = fields[columns.index('gene2')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
192 gene_info = 'GENEID=%s,%s' % (gene1,gene2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
193 gene_name1 = fields[columns.index('gene_name1')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
194 gene_name2 = fields[columns.index('gene_name2')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
195 gene_name_info = 'GENE=%s,%s' % (gene_name1,gene_name2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
196 genomic_break_pos1 = int(fields[columns.index('genomic_break_pos1')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
197 genomic_break_pos2 = int(fields[columns.index('genomic_break_pos2')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
198 breakpoint_homology = int(fields[columns.index('breakpoint_homology')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
199 homlen = 'HOMLEN=%s' % breakpoint_homology |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
200 orf = fields[columns.index('orf')] == 'Y' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
201 exonboundaries = fields[columns.index('exonboundaries')] == 'Y' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
202 read_through = fields[columns.index('read_through')] == 'Y' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
203 span_count = int(fields[columns.index('span_count')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
204 splitr_count = int(fields[columns.index('splitr_count')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
205 splice_score = int(fields[columns.index('splice_score')]) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
206 probability = fields[columns.index('probability')] if columns.index('probability') else '.' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
207 splitr_sequence = fields[columns.index('splitr_sequence')] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
208 split_seqs = splitr_sequence.split('|') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
209 mate_id1 = "bnd_%s_1" % cluster_id |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
210 mate_id2 = "bnd_%s_2" % cluster_id |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
211 ref1 = split_seqs[0][-1] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
212 ref2 = split_seqs[1][0] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
213 b1 = '[' if genomic_strand1 == '+' else ']' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
214 b2 = '[' if genomic_strand2 == '+' else ']' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
215 alt1 = "%s%s%s:%d%s" % (ref1,b2,gene_chromosome2,genomic_break_pos2,b2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
216 alt2 = "%s%s:%d%s%s" % (b1,gene_chromosome1,genomic_break_pos1,b1,ref2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
217 #TODO evaluate what should be included in the INFO field |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
218 info = ['DP=%d' % (span_count + splitr_count),'SPLITCNT=%d' % splitr_count,'SPANCNT=%d' % span_count,gene_name_info,gene_info,homlen,'SPLICESCORE=%d' % splice_score] |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
219 if orf: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
220 info.append('ORF') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
221 if exonboundaries: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
222 info.append('EXONBND') |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
223 info1 = [svtype,'MATEID=%s' % mate_id2] + info |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
224 info2 = [svtype,'MATEID=%s' % mate_id1] + info |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
225 qual = int(float(fields[columns.index('probability')]) * 255) if columns.index('probability') else '.' |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
226 vcf1 = '%s\t%d\t%s\t%s\t%s\t%s\t%s\t%s'% (gene_chromosome1,genomic_break_pos1, mate_id1, ref1, alt1, qual, filt, ';'.join(info1) ) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
227 vcf2 = '%s\t%d\t%s\t%s\t%s\t%s\t%s\t%s'% (gene_chromosome2,genomic_break_pos2, mate_id2, ref2, alt2, qual, filt, ';'.join(info2) ) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
228 add_vcf_line(gene_chromosome1,genomic_break_pos1,mate_id1,vcf1) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
229 add_vcf_line(gene_chromosome2,genomic_break_pos2,mate_id2,vcf2) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
230 write_vcf() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
231 except Exception, e: |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
232 print >> sys.stderr, "failed: %s" % e |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
233 exit(1) |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
234 |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
235 if __name__ == "__main__" : __main__() |
2ecf82136986
Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff
changeset
|
236 |