annotate fastaregexfinder.py @ 0:0c5613c6a863 draft default tip

planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
author matthias
date Mon, 18 Dec 2017 05:16:59 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
1 #!/usr/bin/env python
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
2
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
3 import re
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
4 import sys
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
5 import string
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
6 import argparse
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
7 import operator
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
8
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
9 VERSION='0.1.1'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
10
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
11 parser = argparse.ArgumentParser(description="""
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
12
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
13 DESCRIPTION
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
14
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
15 Search a fasta file for matches to a regex and return a bed file with the
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
16 coordinates of the match and the matched sequence itself.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
17
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
18 Output bed file has columns:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
19 1. Name of fasta sequence (e.g. chromosome)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
20 2. Start of the match
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
21 3. End of the match
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
22 4. ID of the match
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
23 5. Length of the match
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
24 6. Strand
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
25 7. Matched sequence as it appears on the forward strand
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
26
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
27 For matches on the reverse strand it is reported the start and end position on the
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
28 forward strand and the matched string on the forward strand (so the G4 'GGGAGGGT'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
29 present on the reverse strand is reported as ACCCTCCC).
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
30
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
31 Note: Fasta sequences (chroms) are read in memory one at a time along with the
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
32 matches for that chromosome.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
33 The order of the output is: chroms as they are found in the inut fasta, matches
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
34 sorted within chroms by positions.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
35
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
36 EXAMPLE:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
37 ## Test data:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
38 echo '>mychr' > /tmp/mychr.fa
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
39 echo 'ACTGnACTGnACTGnTGAC' >> /tmp/mychr.fa
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
40
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
41 fastaRegexFinder.py -f /tmp/mychr.fa -r 'ACTG'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
42 mychr 0 4 mychr_0_4_for 4 + ACTG
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
43 mychr 5 9 mychr_5_9_for 4 + ACTG
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
44 mychr 10 14 mychr_10_14_for 4 + ACTG
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
45
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
46 fastaRegexFinder.py -f /tmp/mychr.fa -r 'ACTG' --maxstr 3
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
47 mychr 0 4 mychr_0_4_for 4 + ACT[3,4]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
48 mychr 5 9 mychr_5_9_for 4 + ACT[3,4]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
49 mychr 10 14 mychr_10_14_for 4 + ACT[3,4]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
50
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
51 less /tmp/mychr.fa | fastaRegexFinder.py -f - -r 'A\w\wGn'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
52 mychr 0 5 mychr_0_5_for 5 + ACTGn
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
53 mychr 5 10 mychr_5_10_for 5 + ACTGn
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
54 mychr 10 15 mychr_10_15_for 5 + ACTGn
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
55
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
56 DOWNLOAD
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
57 fastaRegexFinder.py is hosted at https://github.com/dariober/bioinformatics-cafe/tree/master/fastaRegexFinder
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
58
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
59 """, formatter_class= argparse.RawTextHelpFormatter)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
60
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
61 parser.add_argument('--fasta', '-f',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
62 type= str,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
63 help='''Input fasta file to search. Use '-' to read the file from stdin.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
64
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
65 ''',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
66 required= True)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
67
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
68 parser.add_argument('--regex', '-r',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
69 type= str,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
70 help='''Regex to be searched in the fasta input.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
71 Matches to the reverse complement will have - strand.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
72 The default regex is '([gG]{3,}\w{1,7}){3,}[gG]{3,}' which searches
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
73 for G-quadruplexes.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
74 ''',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
75 default= '([gG]{3,}\w{1,7}){3,}[gG]{3,}')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
76
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
77 parser.add_argument('--matchcase', '-m',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
78 action= 'store_true',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
79 help='''Match case while searching for matches. Default is
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
80 to ignore case (I.e. 'ACTG' will match 'actg').
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
81 ''')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
82
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
83 parser.add_argument('--noreverse',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
84 action= 'store_true',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
85 help='''Do not search the reverse complement of the input fasta.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
86 Use this flag to search protein sequences.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
87 ''')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
88
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
89 parser.add_argument('--maxstr',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
90 type= int,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
91 required= False,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
92 default= 10000,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
93 help='''Maximum length of the match to report in the 7th column of the output.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
94 Default is to report up to 10000nt.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
95 Truncated matches are reported as <ACTG...ACTG>[<maxstr>,<tot length>]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
96 ''')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
97
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
98 parser.add_argument('--seqnames', '-s',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
99 type= str,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
100 nargs= '+',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
101 default= [None],
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
102 required= False,
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
103 help='''List of fasta sequences in --fasta to
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
104 search. E.g. use --seqnames chr1 chr2 chrM to search only these crhomosomes.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
105 Default is to search all the sequences in input.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
106 ''')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
107 parser.add_argument('--quiet', '-q',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
108 action= 'store_true',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
109 help='''Do not print progress report (i.e. sequence names as they are scanned).
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
110 ''')
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
111
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
112
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
113
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
114 parser.add_argument('--version', '-v', action='version', version='%(prog)s ' + VERSION)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
115
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
116
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
117 args = parser.parse_args()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
118
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
119 " --------------------------[ Check and parse arguments ]---------------------- "
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
120
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
121 if args.matchcase:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
122 flag= 0
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
123 else:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
124 flag= re.IGNORECASE
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
125
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
126 " ------------------------------[ Functions ]--------------------------------- "
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
127
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
128 def sort_table(table, cols):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
129 """ Code to sort list of lists
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
130 see http://www.saltycrane.com/blog/2007/12/how-to-sort-table-by-columns-in-python/
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
131
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
132 sort a table by multiple columns
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
133 table: a list of lists (or tuple of tuples) where each inner list
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
134 represents a row
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
135 cols: a list (or tuple) specifying the column numbers to sort by
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
136 e.g. (1,0) would sort by column 1, then by column 0
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
137 """
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
138 for col in reversed(cols):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
139 table = sorted(table, key=operator.itemgetter(col))
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
140 return(table)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
141
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
142 def trimMatch(x, n):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
143 """ Trim the string x to be at most length n. Trimmed matches will be reported
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
144 with the syntax ACTG[a,b] where Ns are the beginning of x, a is the length of
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
145 the trimmed strng (e.g 4 here) and b is the full length of the match
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
146 EXAMPLE:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
147 trimMatch('ACTGNNNN', 4)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
148 >>>'ACTG[4,8]'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
149 trimMatch('ACTGNNNN', 8)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
150 >>>'ACTGNNNN'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
151 """
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
152 if len(x) > n and n is not None:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
153 m= x[0:n] + '[' + str(n) + ',' + str(len(x)) + ']'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
154 else:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
155 m= x
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
156 return(m)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
157
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
158 def revcomp(x):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
159 """Reverse complement string x. Ambiguity codes are handled and case conserved.
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
160
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
161 Test
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
162 x= 'ACGTRYSWKMBDHVNacgtryswkmbdhvn'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
163 revcomp(x)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
164 """
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
165 compdict= {'A':'T',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
166 'C':'G',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
167 'G':'C',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
168 'T':'A',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
169 'R':'Y',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
170 'Y':'R',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
171 'S':'W',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
172 'W':'S',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
173 'K':'M',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
174 'M':'K',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
175 'B':'V',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
176 'D':'H',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
177 'H':'D',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
178 'V':'B',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
179 'N':'N',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
180 'a':'t',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
181 'c':'g',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
182 'g':'c',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
183 't':'a',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
184 'r':'y',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
185 'y':'r',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
186 's':'w',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
187 'w':'s',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
188 'k':'m',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
189 'm':'k',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
190 'b':'v',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
191 'd':'h',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
192 'h':'d',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
193 'v':'b',
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
194 'n':'n'}
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
195 xrc= []
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
196 for n in x:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
197 xrc.append(compdict[n])
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
198 xrc= ''.join(xrc)[::-1]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
199 return(xrc)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
200 # -----------------------------------------------------------------------------
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
201
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
202 psq_re_f= re.compile(args.regex, flags= flag)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
203 ## psq_re_r= re.compile(regexrev)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
204
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
205 if args.fasta != '-':
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
206 ref_seq_fh= open(args.fasta)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
207 else:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
208 ref_seq_fh= sys.stdin
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
209
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
210 ref_seq=[]
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
211 line= (ref_seq_fh.readline()).strip()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
212 chr= re.sub('^>', '', line)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
213 line= (ref_seq_fh.readline()).strip()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
214 gquad_list= []
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
215 while True:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
216 if not args.quiet:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
217 sys.stderr.write('Processing %s\n' %(chr))
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
218 while line.startswith('>') is False:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
219 ref_seq.append(line)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
220 line= (ref_seq_fh.readline()).strip()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
221 if line == '':
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
222 break
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
223 ref_seq= ''.join(ref_seq)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
224 if args.seqnames == [None] or chr in args.seqnames:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
225 for m in re.finditer(psq_re_f, ref_seq):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
226 matchstr= trimMatch(m.group(0), args.maxstr)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
227 quad_id= str(chr) + '_' + str(m.start()) + '_' + str(m.end()) + '_for'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
228 gquad_list.append([chr, m.start(), m.end(), quad_id, len(m.group(0)), '+', matchstr])
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
229 if args.noreverse is False:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
230 ref_seq= revcomp(ref_seq)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
231 seqlen= len(ref_seq)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
232 for m in re.finditer(psq_re_f, ref_seq):
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
233 matchstr= trimMatch(revcomp(m.group(0)), args.maxstr)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
234 mstart= seqlen - m.end()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
235 mend= seqlen - m.start()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
236 quad_id= str(chr) + '_' + str(mstart) + '_' + str(mend) + '_rev'
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
237 gquad_list.append([chr, mstart, mend, quad_id, len(m.group(0)), '-', matchstr])
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
238 gquad_sorted= sort_table(gquad_list, (1,2,3))
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
239 gquad_list= []
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
240 for xline in gquad_sorted:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
241 xline= '\t'.join([str(x) for x in xline])
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
242 print(xline)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
243 chr= re.sub('^>', '', line)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
244 ref_seq= []
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
245 line= (ref_seq_fh.readline()).strip()
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
246 if line == '':
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
247 break
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
248
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
249 #gquad_sorted= sort_table(gquad_list, (0,1,2,3))
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
250 #
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
251 #for line in gquad_sorted:
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
252 # line= '\t'.join([str(x) for x in line])
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
253 # print(line)
0c5613c6a863 planemo upload for repository https://github.com/bernt-matthias/mb-galaxy-tools.git commit 98943baecfa613d91dbef112fce8c6189f0431db
matthias
parents:
diff changeset
254 sys.exit()