annotate filter-below-abund.py @ 0:0187f18785a3 draft default tip

planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
author iuc
date Sat, 17 Oct 2015 04:02:33 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
1 #! /usr/bin/env python
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
2 # This file is part of khmer, https://github.com/dib-lab/khmer/, and is
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
3 # Copyright (C) 2011-2015, Michigan State University.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
4 # Copyright (C) 2015, The Regents of the University of California.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
5 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
6 # Redistribution and use in source and binary forms, with or without
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
7 # modification, are permitted provided that the following conditions are
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
8 # met:
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
9 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
10 # * Redistributions of source code must retain the above copyright
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
11 # notice, this list of conditions and the following disclaimer.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
12 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
13 # * Redistributions in binary form must reproduce the above
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
14 # copyright notice, this list of conditions and the following
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
15 # disclaimer in the documentation and/or other materials provided
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
16 # with the distribution.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
17 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
18 # * Neither the name of the Michigan State University nor the names
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
19 # of its contributors may be used to endorse or promote products
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
20 # derived from this software without specific prior written
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
21 # permission.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
22 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
23 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
24 # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
25 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
26 # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
27 # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
28 # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
29 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
30 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
31 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
32 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
33 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
34 #
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
35 # Contact: khmer-project@idyll.org
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
36 from __future__ import print_function
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
37 import sys
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
38 import os
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
39 import khmer
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
40 from khmer.thread_utils import ThreadedSequenceProcessor, verbose_fasta_iter
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
41
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
42 WORKER_THREADS = 8
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
43 GROUPSIZE = 100
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
44
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
45 CUTOFF = 50
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
46
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
47 ###
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
48
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
49
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
50 def main():
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
51 counting_ht = sys.argv[1]
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
52 infiles = sys.argv[2:]
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
53
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
54 print('file with ht: %s' % counting_ht)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
55 print('-- settings:')
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
56 print('N THREADS', WORKER_THREADS)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
57 print('--')
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
58
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
59 print('making hashtable')
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
60 ht = khmer.load_countgraph(counting_ht)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
61 K = ht.ksize()
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
62
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
63 for infile in infiles:
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
64 print('filtering', infile)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
65 outfile = os.path.basename(infile) + '.below'
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
66
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
67 outfp = open(outfile, 'w')
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
68
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
69 def process_fn(record, ht=ht):
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
70 name = record['name']
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
71 seq = record['sequence']
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
72 if 'N' in seq:
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
73 return None, None
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
74
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
75 trim_seq, trim_at = ht.trim_below_abundance(seq, CUTOFF)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
76
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
77 if trim_at >= K:
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
78 return name, trim_seq
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
79
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
80 return None, None
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
81
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
82 tsp = ThreadedSequenceProcessor(process_fn, WORKER_THREADS, GROUPSIZE)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
83
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
84 tsp.start(verbose_fasta_iter(infile), outfp)
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
85
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
86 if __name__ == '__main__':
0187f18785a3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/khmer/ commit 37727831a2630b7a7d4fb033366cbd772c3086c8
iuc
parents:
diff changeset
87 main()