Mercurial > repos > peterjc > tmhmm_and_signalp
comparison tools/protein_analysis/tmhmm2.py @ 2:747cec3192d3
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
| author | peterjc |
|---|---|
| date | Tue, 07 Jun 2011 17:38:43 -0400 |
| parents | 9a8a7f680dd6 |
| children | 5e62aefb2918 |
comparison
equal
deleted
inserted
replaced
| 1:9a8a7f680dd6 | 2:747cec3192d3 |
|---|---|
| 4 This script takes exactly two command line arguments - an input protein FASTA | 4 This script takes exactly two command line arguments - an input protein FASTA |
| 5 filename and an output tabular filename. It then calls the standalone TMHMM | 5 filename and an output tabular filename. It then calls the standalone TMHMM |
| 6 v2.0 program (not the webservice) requesting the short output (one line per | 6 v2.0 program (not the webservice) requesting the short output (one line per |
| 7 protein). | 7 protein). |
| 8 | 8 |
| 9 First major feature is cleaning up the tabular output. The raw output from | 9 The first major feature is cleaning up the tabular output. The short form raw |
| 10 TMHMM v2.0 looks like this (six columns tab separated): | 10 output from TMHMM v2.0 looks like this (six columns tab separated): |
| 11 | 11 |
| 12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o | 12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o |
| 13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o | 13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o |
| 14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o | 14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o |
| 15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i | 15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i |
| 16 | |
| 17 If there are any additional 'comment' lines starting with the hash (#) | |
| 18 character these are ignored by this script. | |
| 16 | 19 |
| 17 In order to make it easier to use in Galaxy, this wrapper script simplifies | 20 In order to make it easier to use in Galaxy, this wrapper script simplifies |
| 18 this to remove the redundant tags, and instead adds a comment line at the | 21 this to remove the redundant tags, and instead adds a comment line at the |
| 19 top with the column names: | 22 top with the column names: |
| 20 | 23 |
| 53 | 56 |
| 54 def clean_tabular(raw_handle, out_handle): | 57 def clean_tabular(raw_handle, out_handle): |
| 55 """Clean up tabular TMHMM output, returns output line count.""" | 58 """Clean up tabular TMHMM output, returns output line count.""" |
| 56 count = 0 | 59 count = 0 |
| 57 for line in raw_handle: | 60 for line in raw_handle: |
| 58 if not line: | 61 if not line.strip() or line.startswith("#"): |
| 62 #Ignore any blank lines or comment lines | |
| 59 continue | 63 continue |
| 60 parts = line.rstrip("\r\n").split("\t") | 64 parts = line.rstrip("\r\n").split("\t") |
| 61 try: | 65 try: |
| 62 identifier, length, expAA, first60, predhel, topology = parts | 66 identifier, length, expAA, first60, predhel, topology = parts |
| 63 except: | 67 except: |
| 80 | 84 |
| 81 #Note that if the input FASTA file contains no sequences, | 85 #Note that if the input FASTA file contains no sequences, |
| 82 #split_fasta returns an empty list (i.e. zero temp files). | 86 #split_fasta returns an empty list (i.e. zero temp files). |
| 83 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) | 87 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) |
| 84 temp_files = [f+".out" for f in fasta_files] | 88 temp_files = [f+".out" for f in fasta_files] |
| 85 jobs = ["tmhmm %s > %s" % (fasta, temp) | 89 jobs = ["tmhmm -short %s > %s" % (fasta, temp) |
| 86 for fasta, temp in zip(fasta_files, temp_files)] | 90 for fasta, temp in zip(fasta_files, temp_files)] |
| 87 | 91 |
| 88 def clean_up(file_list): | 92 def clean_up(file_list): |
| 89 for f in file_list: | 93 for f in file_list: |
| 90 if os.path.isfile(f): | 94 if os.path.isfile(f): |
