Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/tmhmm2.py @ 29:3cb02adf4326 draft
v0.2.9 Python style improvements
| author | peterjc |
|---|---|
| date | Wed, 01 Feb 2017 09:46:14 -0500 |
| parents | 20139cb4c844 |
| children | 6d9d7cdf00fc |
| rev | line source |
|---|---|
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Wrapper for TMHMM v2.0 for use in Galaxy. |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
| 7 | 4 This script takes exactly three command line arguments - number of threads, |
| 5 an input protein FASTA filename, and an output tabular filename. It then | |
| 6 calls the standalone TMHMM v2.0 program (not the webservice) requesting | |
| 7 the short output (one line per protein). | |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 |
|
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
9 The first major feature is cleaning up the tabular output. The short form raw |
|
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
10 output from TMHMM v2.0 looks like this (six columns tab separated): |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 |
|
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
17 If there are any additional 'comment' lines starting with the hash (#) |
|
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
18 character these are ignored by this script. |
|
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
19 |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 In order to make it easier to use in Galaxy, this wrapper script simplifies |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 this to remove the redundant tags, and instead adds a comment line at the |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 top with the column names: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 |
| 29 | 24 #ID len ExpAA First60 PredHel Topology |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 The second major potential feature is taking advantage of multiple cores |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 into chunks and running multiple copies of TMHMM in parallel. I would normally |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 use Python's multiprocessing library in this situation but it requires at |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4. |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
35 |
| 7 | 36 Note that this is somewhat redundant with job-splitting available in Galaxy |
| 37 itself (see the SignalP XML file for settings). | |
| 38 | |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
39 Also tmhmm2 can fail without returning an error code, for example if run on a |
|
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
40 64 bit machine with only the 32 bit binaries installed. This script will spot |
|
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
41 when there is no output from tmhmm2, and raise an error. |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 """ |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 import sys |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 import os |
| 7 | 45 import tempfile |
| 29 | 46 from seq_analysis_utils import split_fasta, run_jobs, thread_count |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 FASTA_CHUNK = 500 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 if len(sys.argv) != 4: |
| 29 | 51 sys.exit("Require three arguments, number of threads (int), input protein FASTA file & output tabular file") |
| 7 | 52 |
| 53 num_threads = thread_count(sys.argv[1], default=4) | |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 fasta_file = sys.argv[2] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
55 tabular_file = sys.argv[3] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
56 |
| 7 | 57 tmp_dir = tempfile.mkdtemp() |
| 58 | |
| 29 | 59 |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
60 def clean_tabular(raw_handle, out_handle): |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
61 """Clean up tabular TMHMM output, returns output line count.""" |
|
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
62 count = 0 |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
63 for line in raw_handle: |
|
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
64 if not line.strip() or line.startswith("#"): |
| 29 | 65 # Ignore any blank lines or comment lines |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 continue |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 parts = line.rstrip("\r\n").split("\t") |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 try: |
| 29 | 69 identifier, length, exp_aa, first60, predhel, topology = parts |
| 70 except ValueError: | |
| 71 assert len(parts) != 6 | |
| 72 sys.exit("Bad line: %r" % line) | |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 assert length.startswith("len="), line |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 length = length[4:] |
| 29 | 75 assert exp_aa.startswith("ExpAA="), line |
| 76 exp_aa = exp_aa[6:] | |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 assert first60.startswith("First60="), line |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
78 first60 = first60[8:] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
79 assert predhel.startswith("PredHel="), line |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 predhel = predhel[8:] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 assert topology.startswith("Topology="), line |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
82 topology = topology[9:] |
| 29 | 83 out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" |
| 84 % (identifier, length, exp_aa, first60, predhel, topology)) | |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
85 count += 1 |
|
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
86 return count |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 |
| 29 | 88 # Note that if the input FASTA file contains no sequences, |
| 89 # split_fasta returns an empty list (i.e. zero temp files). | |
| 7 | 90 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK) |
| 29 | 91 temp_files = [f + ".out" for f in fasta_files] |
|
2
747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents:
1
diff
changeset
|
92 jobs = ["tmhmm -short %s > %s" % (fasta, temp) |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 for fasta, temp in zip(fasta_files, temp_files)] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 |
| 29 | 95 |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
96 def clean_up(file_list): |
| 29 | 97 """Remove temp files, and if possible the temp directory.""" |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 for f in file_list: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 if os.path.isfile(f): |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 os.remove(f) |
| 7 | 101 try: |
| 102 os.rmdir(tmp_dir) | |
| 29 | 103 except Exception: |
| 7 | 104 pass |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 if len(jobs) > 1 and num_threads > 1: |
| 29 | 107 # A small "info" message for Galaxy to show the user. |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs)) |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 results = run_jobs(jobs, num_threads) |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
110 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 error_level = results[cmd] |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 if error_level: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 try: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 output = open(temp).readline() |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 except IOError: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 output = "" |
| 7 | 117 clean_up(fasta_files + temp_files) |
| 29 | 118 sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output), |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
119 error_level) |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 del results |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 del jobs |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 out_handle = open(tabular_file, "w") |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n") |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 for temp in temp_files: |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
126 data_handle = open(temp) |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
127 count = clean_tabular(data_handle, out_handle) |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
128 data_handle.close() |
|
1
9a8a7f680dd6
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
129 if not count: |
| 7 | 130 clean_up(fasta_files + temp_files) |
| 29 | 131 sys.exit("No output from tmhmm2") |
|
0
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
132 out_handle.close() |
|
a2eeeaa6f75e
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
133 |
| 7 | 134 clean_up(fasta_files + temp_files) |
