Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/rxlr_motifs.py @ 17:af3174637834 draft
Uploaded v0.2.5 preview 1, adding links to Tool Shed entry
| author | peterjc | 
|---|---|
| date | Thu, 02 May 2013 13:32:58 -0400 | 
| parents | 39a6e46cdda3 | 
| children | 2b35b5c4b7f4 | 
| rev | line source | 
|---|---|
| 
6
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
1 #!/usr/bin/env python | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
2 """Implements assorted RXLR motif methods from the literature | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
3 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
4 This script takes exactly four command line arguments: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
5 * Protein FASTA filename | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
6 * Number of threads | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
8 * Output tabular filename | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
9 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
10 The model names are: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
11 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
12 Bhattacharjee2006: Simple regular expression search for RXLR | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
13 with additional requirements for positioning and signal peptide. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
14 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
15 Win2007: Simple regular expression search for RXLR, but with | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
16 different positional requirements. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
17 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
18 Whisson2007: As Bhattacharjee2006 but with a more complex regular | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
19 expression to look for RXLR-EER domain, and additionally calls HMMER. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
20 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
21 See the help text in the accompanying Galaxy tool XML file for more | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
22 details including the full references. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
23 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
24 Note: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
25 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
26 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
27 which is no longer available. The current release is SignalP v3.0 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
28 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
29 the predicted cleavage site, as this is expected to be more accurate. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
30 Also note that the HMM score values have changed from v2.0 to v3.0. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
31 Whisson et al. (2007) used SignalP v3.0 anyway. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
32 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
33 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
34 can still be used with hmmsearch from HMMER 3 this this does give | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
35 slightly different results. We expect the hmmsearch from HMMER 2.3.2 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
36 (the last stable release of HMMER 2) to be present on the path under | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
37 the name hmmsearch2 (allowing it to co-exist with HMMER 3). | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
38 """ | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
39 import os | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
40 import sys | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
41 import re | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
42 import subprocess | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
43 from seq_analysis_utils import stop_err, fasta_iterator | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
44 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
45 if len(sys.argv) != 5: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
46 stop_err("Requires four arguments: protein FASTA filename, threads, model, and output filename") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
47 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
48 fasta_file, threads, model, tabular_file = sys.argv[1:] | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
49 hmm_output_file = tabular_file + ".hmm.tmp" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
50 signalp_input_file = tabular_file + ".fasta.tmp" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
51 signalp_output_file = tabular_file + ".tabular.tmp" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
52 min_signalp_hmm = 0.9 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
53 hmmer_search = "hmmsearch2" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
54 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
55 if model == "Bhattacharjee2006": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
56 signalp_trunc = 70 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
57 re_rxlr = re.compile("R.LR") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
58 min_sp = 10 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
59 max_sp = 40 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
60 max_sp_rxlr = 100 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
61 min_rxlr_start = 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
62 #Allow signal peptide to be at most 40aa, and want RXLR to be | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
63 #within 100aa, therefore for the prescreen the max start is 140: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
64 max_rxlr_start = max_sp + max_sp_rxlr | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
65 elif model == "Win2007": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
66 signalp_trunc = 70 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
67 re_rxlr = re.compile("R.LR") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
68 min_sp = 10 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
69 max_sp = 40 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
70 min_rxlr_start = 30 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
71 max_rxlr_start = 60 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
72 #No explicit limit on separation of signal peptide clevage | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
73 #and RXLR, but shortest signal peptide is 10, and furthest | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
74 #away RXLR is 60, so effectively limit is 50. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
75 max_sp_rxlr = max_rxlr_start - min_sp + 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
76 elif model == "Whisson2007": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
77 signalp_trunc = 0 #zero for no truncation | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
78 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
79 min_sp = 10 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
80 max_sp = 40 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
81 max_sp_rxlr = 100 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
82 min_rxlr_start = 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
83 max_rxlr_start = max_sp + max_sp_rxlr | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
84 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
85 stop_err("Did not recognise the model name %r\n" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
86 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
87 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
88 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
89 def get_hmmer_version(exe, required=None): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
90 cmd = "%s -h" % exe | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
91 try: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
92 child = subprocess.Popen([exe, "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
93 except OSError: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
94 raise ValueError("Could not run %s" % exe) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
95 stdout, stderr = child.communicate() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
96 if required: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
97 return required in stdout | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
98 elif "HMMER 2" in stdout: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
99 return 2 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
100 elif "HMMER 3" in stdout: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
101 return 3 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
102 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
103 raise ValueError("Could not determine version of %s" % exe) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
104 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
105 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
106 #Run hmmsearch for Whisson et al. (2007) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
107 if model == "Whisson2007": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
108 hmm_file = os.path.join(os.path.split(sys.argv[0])[0], | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
109 "whisson_et_al_rxlr_eer_cropped.hmm") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
110 if not os.path.isfile(hmm_file): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
111 stop_err("Missing HMM file for Whisson et al. (2007)") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
112 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
113 stop_err("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_searcher) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
114 #I've left the code to handle HMMER 3 in situ, in case | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
115 #we revisit the choice to insist on HMMER 2. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
116 hmmer3 = (3 == get_hmmer_version(hmmer_search)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
117 #Using zero (or 5.6?) for bitscore threshold | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
118 if hmmer3: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
119 #The HMMER3 table output is easy to parse | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
120 #In HMMER3 can't use both -T and -E | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
121 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \ | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
122 % (hmmer_search, hmm_output_file, hmm_file, fasta_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
123 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
124 #For HMMER2 we are stuck with parsing stdout | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
125 #Put 1e6 to effectively have no expectation threshold (otherwise | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
126 #HMMER defaults to 10 and the calculated e-value depends on the | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
127 #input FASTA file, and we can loose hits of interest). | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
128 cmd = "%s -T 0 -E 1e6 %s %s > %s" \ | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
129 % (hmmer_search, hmm_file, fasta_file, hmm_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
130 return_code = os.system(cmd) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
131 if return_code: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
132 stop_err("Error %i from hmmsearch:\n%s" % (return_code, cmd)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
133 hmm_hits = set() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
134 valid_ids = set() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
135 for title, seq in fasta_iterator(fasta_file): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
136 name = title.split(None,1)[0] | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
137 if name in valid_ids: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
138 stop_err("Duplicated identifier %r" % name) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
139 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
140 valid_ids.add(name) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
141 handle = open(hmm_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
142 for line in handle: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
143 if not line.strip(): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
144 #We expect blank lines in the HMMER2 stdout | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
145 continue | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
146 elif line.startswith("#"): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
147 #Header | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
148 continue | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
149 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
150 name = line.split(None,1)[0] | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
151 #Should be a sequence name in the HMMER3 table output. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
152 #Could be anything in the HMMER2 stdout. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
153 if name in valid_ids: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
154 hmm_hits.add(name) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
155 elif hmmer3: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
156 stop_err("Unexpected identifer %r in hmmsearch output" % name) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
157 handle.close() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
158 #if hmmer3: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
159 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
160 #else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
161 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
162 #print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
163 os.remove(hmm_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
164 del valid_ids | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
165 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
166 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
167 #Prepare short list of candidates containing RXLR to pass to SignalP | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
168 assert min_rxlr_start > 0, "Min value one, since zero based counting" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
169 count = 0 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
170 total = 0 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
171 handle = open(signalp_input_file, "w") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
172 for title, seq in fasta_iterator(fasta_file): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
173 total += 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
174 name = title.split(None,1)[0] | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
175 match = re_rxlr.search(seq[min_rxlr_start-1:].upper()) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
176 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
177 #This is a potential RXLR, depending on the SignalP results. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
178 #Might as well truncate the sequence now, makes the temp file smaller | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
179 if signalp_trunc: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
180 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc])) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
181 else: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
182 #Does it matter we don't line wrap? | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
183 handle.write(">%s\n%s\n" % (name, seq)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
184 count += 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
185 handle.close() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
186 #print "Running SignalP on %i/%i potentials." % (count, total) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
187 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
188 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
189 #Run SignalP (using our wrapper script to get multi-core support etc) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
190 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
191 if not os.path.isfile(signalp_script): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
192 stop_err("Error - missing signalp3.py script") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
193 cmd = "python %s euk %i %s %s %s" % (signalp_script, signalp_trunc, threads, signalp_input_file, signalp_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
194 return_code = os.system(cmd) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
195 if return_code: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
196 stop_err("Error %i from SignalP:\n%s" % (return_code, cmd)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
197 #print "SignalP done" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
198 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
199 def parse_signalp(filename): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
200 """Parse SignalP output, yield tuples of ID, HMM_Sprob_score and NN predicted signal peptide length. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
201 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
202 For signal peptide length we use NN_Ymax_pos (minus one). | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
203 """ | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
204 handle = open(filename) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
205 line = handle.readline() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
206 assert line.startswith("#ID\t"), line | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
207 for line in handle: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
208 parts = line.rstrip("\t").split("\t") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
209 assert len(parts)==20, repr(line) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
210 yield parts[0], float(parts[18]), int(parts[5])-1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
211 handle.close() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
212 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
213 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
214 #Parse SignalP results and apply the strict RXLR criteria | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
215 total = 0 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
216 tally = dict() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
217 handle = open(tabular_file, "w") | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
218 handle.write("#ID\t%s\n" % model) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
219 signalp_results = parse_signalp(signalp_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
220 for title, seq in fasta_iterator(fasta_file): | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
221 total += 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
222 rxlr = "N" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
223 name = title.split(None,1)[0] | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
224 match = re_rxlr.search(seq[min_rxlr_start-1:].upper()) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
225 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
226 del match | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
227 #This was the criteria for calling SignalP, | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
228 #so it will be in the SignalP results. | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
229 sp_id, sp_hmm_score, sp_nn_len = signalp_results.next() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
230 assert name == sp_id, "%s vs %s" % (name, sp_id) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
231 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
232 match = re_rxlr.search(seq[sp_nn_len:].upper()) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
233 if match and match.start() + 1 <= max_sp_rxlr: #1-based counting | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
234 rxlr_start = sp_nn_len + match.start() + 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
235 if min_rxlr_start <= rxlr_start <= max_rxlr_start: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
236 rxlr = "Y" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
237 if model == "Whisson2007": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
238 #Combine the signalp with regular expression heuristic and the HMM | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
239 if name in hmm_hits and rxlr == "N": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
240 rxlr = "hmm" #HMM only | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
241 elif rxlr == "N": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
242 rxlr = "neither" #Don't use N (no) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
243 elif name not in hmm_hits and rxlr == "Y": | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
244 rxlr = "re" #Heuristic only | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
245 #Now have a four way classifier: Y, hmm, re, neither | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
246 #and count is the number of Y results (both HMM and heuristic) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
247 handle.write("%s\t%s\n" % (name, rxlr)) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
248 try: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
249 tally[rxlr] += 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
250 except KeyError: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
251 tally[rxlr] = 1 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
252 handle.close() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
253 assert sum(tally.values()) == total | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
254 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
255 #Check the iterator is finished | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
256 try: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
257 signalp_results.next() | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
258 assert False, "Unexpected data in SignalP output" | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
259 except StopIteration: | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
260 pass | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
261 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
262 #Cleanup | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
263 os.remove(signalp_input_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
264 os.remove(signalp_output_file) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
265 | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
266 #Short summary to stdout for Galaxy's info display | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
267 print "%s for %i sequences:" % (model, total) | 
| 
 
39a6e46cdda3
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
 
peterjc 
parents:  
diff
changeset
 | 
268 print ", ".join("%s = %i" % kv for kv in sorted(tally.iteritems())) | 
