annotate tools/protein_analysis/tmhmm2.py @ 29:3cb02adf4326 draft

v0.2.9 Python style improvements
author peterjc
date Wed, 01 Feb 2017 09:46:14 -0500
parents 20139cb4c844
children 6d9d7cdf00fc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for TMHMM v2.0 for use in Galaxy.
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
4 This script takes exactly three command line arguments - number of threads,
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
5 an input protein FASTA filename, and an output tabular filename. It then
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
6 calls the standalone TMHMM v2.0 program (not the webservice) requesting
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
7 the short output (one line per protein).
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
2
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
9 The first major feature is cleaning up the tabular output. The short form raw
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
10 output from TMHMM v2.0 looks like this (six columns tab separated):
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
2
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
17 If there are any additional 'comment' lines starting with the hash (#)
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
18 character these are ignored by this script.
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
19
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 In order to make it easier to use in Galaxy, this wrapper script simplifies
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 this to remove the redundant tags, and instead adds a comment line at the
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 top with the column names:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
24 #ID len ExpAA First60 PredHel Topology
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 The second major potential feature is taking advantage of multiple cores
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 into chunks and running multiple copies of TMHMM in parallel. I would normally
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 use Python's multiprocessing library in this situation but it requires at
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
35
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
36 Note that this is somewhat redundant with job-splitting available in Galaxy
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
37 itself (see the SignalP XML file for settings).
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
38
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
39 Also tmhmm2 can fail without returning an error code, for example if run on a
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
40 64 bit machine with only the 32 bit binaries installed. This script will spot
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
41 when there is no output from tmhmm2, and raise an error.
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 """
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 import sys
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 import os
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
45 import tempfile
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
46 from seq_analysis_utils import split_fasta, run_jobs, thread_count
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 FASTA_CHUNK = 500
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 if len(sys.argv) != 4:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
51 sys.exit("Require three arguments, number of threads (int), input protein FASTA file & output tabular file")
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
52
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
53 num_threads = thread_count(sys.argv[1], default=4)
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 fasta_file = sys.argv[2]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 tabular_file = sys.argv[3]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
57 tmp_dir = tempfile.mkdtemp()
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
58
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
59
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 def clean_tabular(raw_handle, out_handle):
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
61 """Clean up tabular TMHMM output, returns output line count."""
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
62 count = 0
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 for line in raw_handle:
2
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
64 if not line.strip() or line.startswith("#"):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
65 # Ignore any blank lines or comment lines
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 continue
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 parts = line.rstrip("\r\n").split("\t")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 try:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
69 identifier, length, exp_aa, first60, predhel, topology = parts
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
70 except ValueError:
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
71 assert len(parts) != 6
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
72 sys.exit("Bad line: %r" % line)
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 assert length.startswith("len="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 length = length[4:]
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
75 assert exp_aa.startswith("ExpAA="), line
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
76 exp_aa = exp_aa[6:]
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 assert first60.startswith("First60="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 first60 = first60[8:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 assert predhel.startswith("PredHel="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 predhel = predhel[8:]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 assert topology.startswith("Topology="), line
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 topology = topology[9:]
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
83 out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n"
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
84 % (identifier, length, exp_aa, first60, predhel, topology))
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
85 count += 1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
86 return count
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
88 # Note that if the input FASTA file contains no sequences,
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
89 # split_fasta returns an empty list (i.e. zero temp files).
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
90 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
91 temp_files = [f + ".out" for f in fasta_files]
2
747cec3192d3 Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
92 jobs = ["tmhmm -short %s > %s" % (fasta, temp)
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 for fasta, temp in zip(fasta_files, temp_files)]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
95
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 def clean_up(file_list):
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
97 """Remove temp files, and if possible the temp directory."""
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 for f in file_list:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 if os.path.isfile(f):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 os.remove(f)
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
101 try:
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
102 os.rmdir(tmp_dir)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
103 except Exception:
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
104 pass
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 if len(jobs) > 1 and num_threads > 1:
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
107 # A small "info" message for Galaxy to show the user.
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 results = run_jobs(jobs, num_threads)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 error_level = results[cmd]
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 if error_level:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 try:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 output = open(temp).readline()
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 except IOError:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 output = ""
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
117 clean_up(fasta_files + temp_files)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
118 sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 error_level)
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 del results
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 del jobs
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 out_handle = open(tabular_file, "w")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n")
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 for temp in temp_files:
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126 data_handle = open(temp)
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
127 count = clean_tabular(data_handle, out_handle)
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 data_handle.close()
1
9a8a7f680dd6 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
129 if not count:
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
130 clean_up(fasta_files + temp_files)
29
3cb02adf4326 v0.2.9 Python style improvements
peterjc
parents: 26
diff changeset
131 sys.exit("No output from tmhmm2")
0
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 out_handle.close()
a2eeeaa6f75e Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133
7
5e62aefb2918 Uploaded v0.1.2 to Test Tool Shed
peterjc
parents: 2
diff changeset
134 clean_up(fasta_files + temp_files)