annotate tools/primers/seq_primer_clip.py.orig @ 1:06e6112091aa draft

Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
author peterjc
date Tue, 30 Apr 2013 11:04:13 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
2 """Looks for the given primer sequences and clips matching SFF reads.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
3
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
4 Takes eight command line options, input read filename, input read format,
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
5 input primer FASTA filename, type of primers (forward, reverse or reverse-
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
6 complement), number of mismatches (currently only 0, 1 and 2 are supported),
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
7 minimum length to keep a read (after primer trimming), should primer-less
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
8 reads be kept (boolean), and finally the output sequence filename.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
9
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
10 Both the primer and read sequences can contain IUPAC ambiguity codes like N.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
11
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
12 This supports FASTA, FASTQ and SFF sequence files. Colorspace reads are not
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
13 supported.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
14
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
15 The mismatch parameter does not consider gapped alignemnts, however the
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
16 special case of missing bases at the very start or end of the read is handled.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
17 e.g. a primer sequence CCGACTCGAG will match a read starting CGACTCGAG...
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
18 if one or more mismatches are allowed.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
19
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
20 This can also be used for stripping off (and optionally filtering on) barcodes.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
21
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
22 Note that only the trim/clip values in the SFF file are changed, not the flow
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
23 information of the full read sequence.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
24
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
25 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
26 (formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
27 See accompanying text file for licence details (MIT/BSD style).
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
28
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
29 This is version 0.0.8 of the script. Currently it uses Python's regular
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
30 expression engine for finding the primers, which for my needs is fast enough.
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
31 """
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
32 import sys
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
33 import re
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
34 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
35 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
36
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
37 if "-v" in sys.argv or "--version" in sys.argv:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
38 print "v0.0.5"
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
39 sys.exit(0)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
40
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
41 def stop_err(msg, err=1):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
42 sys.stderr.write(msg)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
43 sys.exit(err)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
44
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
45 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
46 from Bio.Seq import reverse_complement
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
47 from Bio.SeqIO.SffIO import SffIterator, SffWriter
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
48 except ImportError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
49 stop_err("Requires Biopython 1.54 or later")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
50 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
51 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
52 except ImportError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
53 #Prior to Biopython 1.56 this was a private function
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
54 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
55
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
56 #Parse Command Line
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
57 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
58 in_file, seq_format, primer_fasta, primer_type, mm, min_len, keep_negatives, out_file = sys.argv[1:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
59 except ValueError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
60 stop_err("Expected 8 arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
61
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
62 if in_file == primer_fasta:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
63 stop_err("Same file given as both primer sequences and sequences to clip!")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
64 if in_file == out_file:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
65 stop_err("Same file given as both sequences to clip and output!")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
66 if primer_fasta == out_file:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
67 stop_err("Same file given as both primer sequences and output!")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
68
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
69 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
70 mm = int(mm)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
71 except ValueError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
72 stop_err("Expected non-negative integer number of mismatches (e.g. 0 or 1), not %r" % mm)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
73 if mm < 0:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
74 stop_err("Expected non-negtive integer number of mismatches (e.g. 0 or 1), not %r" % mm)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
75 if mm not in [0,1,2]:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
76 raise NotImplementedError
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
77
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
78 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
79 min_len = int(min_len)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
80 except ValueError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
81 stop_err("Expected non-negative integer min_len (e.g. 0 or 1), not %r" % min_len)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
82 if min_len < 0:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
83 stop_err("Expected non-negtive integer min_len (e.g. 0 or 1), not %r" % min_len)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
84
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
85
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
86 if keep_negatives.lower() in ["true", "yes", "on"]:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
87 keep_negatives = True
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
88 elif keep_negatives.lower() in ["false", "no", "off"]:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
89 keep_negatives = False
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
90 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
91 stop_err("Expected boolean for keep_negatives (e.g. true or false), not %r" % keep_negatives)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
92
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
93
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
94 if primer_type.lower() == "forward":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
95 forward = True
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
96 rc = False
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
97 elif primer_type.lower() == "reverse":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
98 forward = False
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
99 rc = False
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
100 elif primer_type.lower() == "reverse-complement":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
101 forward = False
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
102 rc = True
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
103 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
104 stop_err("Expected foward, reverse or reverse-complement not %r" % primer_type)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
105
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
106
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
107 ambiguous_dna_values = {
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
108 "A": "A",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
109 "C": "C",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
110 "G": "G",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
111 "T": "T",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
112 "M": "ACM",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
113 "R": "AGR",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
114 "W": "ATW",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
115 "S": "CGS",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
116 "Y": "CTY",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
117 "K": "GTK",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
118 "V": "ACGMRSV",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
119 "H": "ACTMWYH",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
120 "D": "AGTRWKD",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
121 "B": "CGTSYKB",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
122 "X": ".", #faster than [GATCMRWSYKVVHDBXN] or even [GATC]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
123 "N": ".",
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
124 }
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
125
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
126 ambiguous_dna_re = {}
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
127 for letter, values in ambiguous_dna_values.iteritems():
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
128 if len(values) == 1:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
129 ambiguous_dna_re[letter] = values
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
130 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
131 ambiguous_dna_re[letter] = "[%s]" % values
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
132
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
133
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
134 def make_reg_ex(seq):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
135 return "".join(ambiguous_dna_re[letter] for letter in seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
136
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
137 def make_reg_ex_mm(seq, mm):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
138 if mm > 2:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
139 raise NotImplementedError("At most 2 mismatches allowed!")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
140 seq = seq.upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
141 yield make_reg_ex(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
142 for i in range(1,mm+1):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
143 #Missing first/last i bases at very start/end of sequence
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
144 for reg in make_reg_ex_mm(seq[i:], mm-i):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
145 yield "^" + reg
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
146 for reg in make_reg_ex_mm(seq[:-i], mm-i):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
147 yield "$" + reg
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
148 if mm >= 1:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
149 for i,letter in enumerate(seq):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
150 #We'll use a set to remove any duplicate patterns
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
151 #if letter not in "NX":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
152 pattern = seq[:i] + "N" + seq[i+1:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
153 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
154 % (pattern, len(pattern), seq, len(seq))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
155 yield make_reg_ex(pattern)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
156 if mm >=2:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
157 for i,letter in enumerate(seq):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
158 #We'll use a set to remove any duplicate patterns
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
159 #if letter not in "NX":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
160 for k,letter in enumerate(seq[i+1:]):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
161 #We'll use a set to remove any duplicate patterns
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
162 #if letter not in "NX":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
163 pattern = seq[:i] + "N" + seq[i+1:i+1+k] + "N" + seq[i+k+2:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
164 assert len(pattern) == len(seq), "Len %s is %i, len %s is %i" \
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
165 % (pattern, len(pattern), seq, len(seq))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
166 yield make_reg_ex(pattern)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
167
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
168 def load_primers_as_re(primer_fasta, mm, rc=False):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
169 #Read primer file and record all specified sequences
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
170 primers = set()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
171 in_handle = open(primer_fasta, "rU")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
172 reader = fastaReader(in_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
173 count = 0
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
174 for record in reader:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
175 if rc:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
176 seq = reverse_complement(record.sequence)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
177 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
178 seq = record.sequence
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
179 #primers.add(re.compile(make_reg_ex(seq)))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
180 count += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
181 for pattern in make_reg_ex_mm(seq, mm):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
182 primers.add(pattern)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
183 in_handle.close()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
184 #Use set to avoid duplicates, sort to have longest first
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
185 #(so more specific primers found before less specific ones)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
186 primers = sorted(set(primers), key=lambda p: -len(p))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
187 return count, re.compile("|".join(primers)) #make one monster re!
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
188
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
189
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
190
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
191 #Read primer file and record all specified sequences
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
192 count, primer = load_primers_as_re(primer_fasta, mm, rc)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
193 print "%i primer sequences" % count
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
194
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
195 short_neg = 0
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
196 short_clipped = 0
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
197 clipped = 0
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
198 negs = 0
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
199
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
200 if seq_format.lower()=="sff":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
201 #SFF is different because we just change the trim points
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
202 if forward:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
203 def process(records):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
204 global short_clipped, short_neg, clipped, negs
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
205 for record in records:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
206 left_clip = record.annotations["clip_qual_left"]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
207 right_clip = record.annotations["clip_qual_right"]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
208 seq = str(record.seq)[left_clip:right_clip].upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
209 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
210 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
211 #Forward primer, take everything after it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
212 #so move the left clip along
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
213 if len(seq) - result.end() >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
214 record.annotations["clip_qual_left"] = left_clip + result.end()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
215 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
216 yield record
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
217 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
218 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
219 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
220 if len(seq) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
221 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
222 yield record
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
223 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
224 short_neg += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
225 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
226 def process(records):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
227 global short_clipped, short_neg, clipped, negs
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
228 for record in records:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
229 left_clip = record.annotations["clip_qual_left"]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
230 right_clip = record.annotations["clip_qual_right"]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
231 seq = str(record.seq)[left_clip:right_clip].upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
232 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
233 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
234 #Reverse primer, take everything before it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
235 #so move the right clip back
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
236 new_len = result.start()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
237 if new_len >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
238 record.annotations["clip_qual_right"] = left_clip + new_len
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
239 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
240 yield record
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
241 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
242 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
243 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
244 if len(seq) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
245 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
246 yield record
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
247 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
248 short_neg += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
249
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
250 in_handle = open(in_file, "rb")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
251 try:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
252 manifest = ReadRocheXmlManifest(in_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
253 except ValueError:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
254 manifest = None
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
255 in_handle.seek(0)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
256 out_handle = open(out_file, "wb")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
257 writer = SffWriter(out_handle, xml=manifest)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
258 writer.write_file(process(SffIterator(in_handle)))
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
259 #End of SFF code
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
260 elif seq_format.lower().startswith("fastq"):
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
261 in_handle = open(in_file, "rU")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
262 out_handle = open(out_file, "w")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
263 reader = fastqReader(in_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
264 writer = fastqWriter(out_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
265 if forward:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
266 for record in reader:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
267 seq = record.sequence.upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
268 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
269 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
270 #Forward primer, take everything after it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
271 cut = result.end()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
272 record.sequence = seq[cut:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
273 if len(record.sequence) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
274 record.quality = record.quality[cut:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
275 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
276 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
277 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
278 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
279 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
280 if len(record) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
281 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
282 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
283 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
284 short_negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
285 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
286 for record in reader:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
287 seq = record.sequence.upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
288 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
289 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
290 #Reverse primer, take everything before it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
291 cut = result.start()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
292 record.sequence = seq[:cut]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
293 if len(record.sequence) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
294 record.quality = record.quality[:cut]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
295 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
296 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
297 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
298 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
299 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
300 if len(record) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
301 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
302 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
303 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
304 short_negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
305 elif seq_format.lower()=="fasta":
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
306 in_handle = open(in_file, "rU")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
307 out_handle = open(out_file, "w")
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
308 reader = fastaReader(in_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
309 writer = fastaWriter(out_handle)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
310 #Following code is identical to that for FASTQ but without editing qualities
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
311 if forward:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
312 for record in reader:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
313 seq = record.sequence.upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
314 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
315 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
316 #Forward primer, take everything after it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
317 cut = result.end()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
318 record.sequence = seq[cut:]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
319 if len(record.sequence) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
320 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
321 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
322 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
323 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
324 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
325 if len(record) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
326 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
327 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
328 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
329 short_negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
330 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
331 for record in reader:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
332 seq = record.sequence.upper()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
333 result = primer.search(seq)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
334 if result:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
335 #Reverse primer, take everything before it
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
336 cut = result.start()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
337 record.sequence = seq[:cut]
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
338 if len(record.sequence) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
339 clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
340 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
341 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
342 short_clipped += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
343 elif keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
344 if len(record) >= min_len:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
345 negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
346 writer.write(record)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
347 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
348 short_negs += 1
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
349 else:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
350 stop_err("Unsupported file type %r" % seq_format)
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
351 in_handle.close()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
352 out_handle.close()
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
353
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
354 print "Kept %i clipped reads," % clipped
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
355 print "discarded %i short." % short_clipped
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
356 if keep_negatives:
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
357 print "Kept %i non-matching reads," % negs
06e6112091aa Uploaded v0.0.9, modifies tests to cope with current Tool Shed limitation.
peterjc
parents:
diff changeset
358 print "discarded %i short." % short_neg