Mercurial > repos > peterjc > tmhmm_and_signalp
changeset 25:41a42022f815 draft
Uploaded v0.2.6, embedded citations
| author | peterjc | 
|---|---|
| date | Fri, 21 Nov 2014 08:17:36 -0500 | 
| parents | ee10017fcd80 | 
| children | 20139cb4c844 | 
| files | tools/protein_analysis/README.rst tools/protein_analysis/promoter2.xml tools/protein_analysis/psortb.xml tools/protein_analysis/rxlr_motifs.xml tools/protein_analysis/seq_analysis_utils.py tools/protein_analysis/signalp3.xml tools/protein_analysis/suite_config.xml tools/protein_analysis/tmhmm2.xml tools/protein_analysis/wolf_psort.xml | 
| diffstat | 9 files changed, 108 insertions(+), 79 deletions(-) [+] | 
line wrap: on
 line diff
--- a/tools/protein_analysis/README.rst Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/README.rst Fri Nov 21 08:17:36 2014 -0500 @@ -41,23 +41,23 @@ First install those command line tools you wish to use the wrappers for: -1. Install the command line version of SignalP 3.0 and ensure "signalp" is +1. Install the command line version of SignalP 3.0 and ensure ``signalp`` is on the PATH, see: http://www.cbs.dtu.dk/services/SignalP/ -2. Install the command line version of TMHMM 2.0 and ensure "tmhmm" is on +2. Install the command line version of TMHMM 2.0 and ensure ``tmhmm`` is on the PATH, see: http://www.cbs.dtu.dk/services/TMHMM/ -3. Install the command line version of Promoter 2.0 and ensure "promoter" is +3. Install the command line version of Promoter 2.0 and ensure ``promoter`` is on the PATH, see: http://www.cbs.dtu.dk/services/Promoter -4. Install the WoLF PSORT v0.2 package, and ensure "runWolfPsortSummary" +4. Install the WoLF PSORT v0.2 package, and ensure ``runWolfPsortSummary`` is on the PATH (we use an extra wrapper script to change to the WoLF PSORT directory, run runWolfPsortSummary, and then change back to the original directory), see: http://wolfpsort.org/WoLFPSORT_package/version0.2/ 5. Install hmmsearch from HMMER 2.3.2 (the last stable release of HMMER 2) - but put it on the path under the name hmmsearch2 (allowing it to co-exist - with HMMER 3), or edit rlxr_motif.py accordingly. + but put it on the path under the name ``hmmsearch2`` (allowing it to + co-exist with HMMER 3), or edit ``rlxr_motif.py`` accordingly. Verify each of the tools is installed and working from the command line (when logged in as the Galaxy user if appropriate). @@ -66,37 +66,36 @@ Manual Installation =================== -1. Create a folder tools/protein_analysis under your Galaxy installation. +1. Create a folder ``tools/protein_analysis`` under your Galaxy installation. This folder name is not critical, and can be changed if desired - you - must update the paths used in tool_conf.xml to match. + must update the paths used in ``tool_conf.xml`` to match. 2. Copy/move the following files (from this archive) there: - * tmhmm2.xml (Galaxy tool definition) - * tmhmm2.py (Python wrapper script) + * ``tmhmm2.xml`` (Galaxy tool definition) + * ``tmhmm2.py`` (Python wrapper script) - * signalp3.xml (Galaxy tool definition) - * signalp3.py (Python wrapper script) + * ``signalp3.xml`` (Galaxy tool definition) + * ``signalp3.py`` (Python wrapper script) - * promoter2.xml (Galaxy tool definition) - * promoter2.py (Python wrapper script) + * ``promoter2.xml`` (Galaxy tool definition) + * ``promoter2.py`` (Python wrapper script) - * psortb.xml (Galaxy tool definition) - * psortb.py (Python wrapper script) + * ``psortb.xml`` (Galaxy tool definition) + * ``psortb.py`` (Python wrapper script) - * wolf_psort.xml (Galaxy tool definition) - * wolf_psort.py (Python wrapper script) + * ``wolf_psort.xml`` (Galaxy tool definition) + * ``wolf_psort.py`` (Python wrapper script) - * rxlr_motifs.xml (Galaxy tool definition) - * rxlr_motifs.py (Python script) + * ``rxlr_motifs.xml`` (Galaxy tool definition) + * ``rxlr_motifs.py`` (Python script) - * seq_analysis_utils.py (shared Python code) - * LICENCE - * README.rst (this file) + * ``seq_analysis_utils.py`` (shared Python code) + * ``LICENCE`` + * ``README.rst`` (this file) -3. Edit your Galaxy conjuration file tool_conf.xml (to use the tools) AND - also tool_conf.xml.sample (to run the tests) to include the new tools - by adding:: +3. Edit your Galaxy conjuration file ``tool_conf.xml`` to include the + new tools by adding:: <section name="Protein sequence analysis" id="protein_analysis"> <tool file="protein_analysis/tmhmm2.xml" /> @@ -111,22 +110,24 @@ Leave out the lines for any tools you do not wish to use in Galaxy. -4. Copy/move the test-data files (from this archive) to Galaxy's - subfolder test-data. +4. Copy/move the ``test-data/*`` files (from this archive) to Galaxy's + subfolder ``test-data/``. 5. Run the Galaxy functional tests for these new wrappers with:: - ./run_functional_tests.sh -id tmhmm2 - ./run_functional_tests.sh -id signalp3 - ./run_functional_tests.sh -id Psortb - ./run_functional_tests.sh -id rxlr_motifs + $ ./run_tests.sh -id tmhmm2 + $ ./run_tests.sh -id signalp3 + $ ./run_tests.sh -id Psortb + $ ./run_tests.sh -id rxlr_motifs - Alternatively, this should work (assuming you left the name and id as shown in - the XML file tool_conf.xml.sample):: + Alternatively, this should work (assuming you left the seciont name and id + as shown above in your XML file ``tool_conf.xml``):: - ./run_functional_tests.sh -sid Protein_sequence_analysis-protein_analysis + $ ./run_tests.sh -sid Protein_sequence_analysis-protein_analysis - To check the section ID expected, use ./run_functional_tests.sh -list + To check the section ID expected, use: + + $ ./run_tests.sh -list 6. Restart Galaxy and check the new tools are shown and work. @@ -139,7 +140,7 @@ ------- ---------------------------------------------------------------------- v0.0.1 - Initial release v0.0.2 - Corrected some typos in the help text - - Renamed test output file to use Galaxy convention of *.tabular + - Renamed test output file to use Galaxy convention of ``*.tabular`` v0.0.3 - Check for tmhmm2 silent failures (no output) - Additional unit tests v0.0.4 - Ignore comment lines in tmhmm2 output. @@ -150,11 +151,11 @@ v0.0.8 - Added WoLF PSORT wrapper to the suite. v0.0.9 - Added our RXLR motifs tool to the suite. v0.1.0 - Added Promoter 2.0 wrapper (similar to SignalP & TMHMM wrappers) - - Support Galaxy's <parallelism> tag for SignalP, TMHMM & Promoter + - Support Galaxy's ``<parallelism>`` tag for SignalP, TMHMM & Promoter v0.1.1 - Fixed an error in the header of the tabular output from Promoter v0.1.2 - Use the new <stdio> settings in the XML wrappers to catch errors - - Use SGE style $NSLOTS for thread count (otherwise default to 4) -v0.1.3 - Added missing file whisson_et_al_rxlr_eer_cropped.hmm to Tool Shed + - Use SGE style ``$NSLOTS`` for thread count (otherwise default to 4) +v0.1.3 - Added missing file ``whisson_et_al_rxlr_eer_cropped.hmm`` to Tool Shed v0.2.0 - Added PSORTb wrapper to the suite, based on earlier work contributed by Konrad Paszkiewicz. v0.2.1 - Use a script to create the Tool Shed tar-ball (removed some stray @@ -170,13 +171,16 @@ - Adopted standard MIT licence. - Use reStructuredText for this README file. - Development moved to GitHub, https://github.com/peterjc/pico_galaxy +v0.2.6 - Use the new ``$GALAXY_SLOTS`` environment variable for thread count. + - Updated the ``suite_config.xml`` file (overdue). + - Tool definition now embeds citation information. ======= ====================================================================== Developers ========== -This script and other tools are being developed on the following hg branches: +This script and other tools were initially developed on the following hg branches: http://bitbucket.org/peterjc/galaxy-central/src/seq_analysis http://bitbucket.org/peterjc/galaxy-central/src/tools
--- a/tools/protein_analysis/promoter2.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/promoter2.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,13 +1,10 @@ -<tool id="promoter2" name="Promoter 2.0" version="0.0.6"> +<tool id="promoter2" name="Promoter 2.0" version="0.0.8"> <description>Find eukaryotic PolII promoters in DNA sequences</description> <!-- If job splitting is enabled, break up the query file into parts --> <!-- Using 2000 per chunk so 4 threads each doing 500 is ideal --> <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> <command interpreter="python"> - promoter2.py "\$NSLOTS" $fasta_file $tabular_file - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + promoter2.py "\$GALAXY_SLOTS" "$fasta_file" "$tabular_file" ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. </command> @@ -85,4 +82,8 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp </help> + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <citation type="doi">10.1093/bioinformatics/15.5.356</citation> + </citations> </tool>
--- a/tools/protein_analysis/psortb.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/psortb.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,14 +1,11 @@ -<tool id="Psortb" name="psortb" version="0.0.3"> +<tool id="Psortb" name="psortb" version="0.0.5"> <description>Determines sub-cellular localisation of bacterial/archaeal protein sequences</description> <!-- If job splitting is enabled, break up the query file into parts --> <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal --> <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> <version_command interpreter="python">psortb.py --version</version_command> <command interpreter="python"> - psortb.py "\$NSLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + psortb.py "\$GALAXY_SLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" ##If the environment variable isn't set, get "", and python wrapper ##defaults to four threads. </command> @@ -19,9 +16,9 @@ </stdio> <inputs> <param format="fasta" name="sequence" type="data" - label="Input sequences for which to predict localisation (protein FASTA format)" /> + label="Input sequences for which to predict localisation (protein FASTA format)" /> <param name="type" type="select" - label="Organism type (N.B. all sequences in the above file must be of the same type)" > + label="Organism type (N.B. all sequences in the above file must be of the same type)" > <option value="-p">Gram positive bacteria</option> <option value="-n">Gram negative bacteria</option> <option value="-a">Archaea</option> @@ -34,11 +31,11 @@ <option value="long">Long (verbose, tabular with about 30 columns, depending on organism type)</option> </param> <param name="cutoff" size="10" type="float" optional="true" value="" - label="Sets a cutoff value for reported results (e.g. 7.5)" - help="Leave blank or use zero for no cutoff." /> + label="Sets a cutoff value for reported results (e.g. 7.5)" + help="Leave blank or use zero for no cutoff." /> <param name="divergent" size="10" type="float" optional="true" value="" - label="Sets a cutoff value for the multiple localization flag (e.g. 4.5)" - help="Leave blank or use zero for no cutoff." /> + label="Sets a cutoff value for the multiple localization flag (e.g. 4.5)" + help="Leave blank or use zero for no cutoff." /> </inputs> <outputs> <data format="tabular" name="outfile" /> @@ -102,5 +99,9 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <citation type="doi">10.1093/bioinformatics/btq249</citation> + </citations> </help> </tool>
--- a/tools/protein_analysis/rxlr_motifs.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/rxlr_motifs.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,8 +1,7 @@ -<tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.7"> +<tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.9"> <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description> <command interpreter="python"> - rxlr_motifs.py $fasta_file 8 $model $tabular_file - ##I want the number of threads to be a Galaxy config option... + rxlr_motifs.py "$fasta_file" "\$GALAXY_SLOTS" $model "$tabular_file" </command> <stdio> <!-- Anything other than zero is an error --> @@ -176,4 +175,14 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp </help> + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <!-- TODO - select from these citations depending on method picked --> + <citation type="doi">10.1038/nature06203</citation> + <citation type="doi">10.1105/tpc.107.051037</citation> + <citation type="doi">10.1371/journal.ppat.0020050</citation> + <citation type="doi">10.1101/gr.910003</citation> + <citation type="doi">10.1093/bioinformatics/14.9.755</citation> + <citation type="doi">10.1093/protein/10.1.1</citation> + </citations> </tool>
--- a/tools/protein_analysis/seq_analysis_utils.py Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/seq_analysis_utils.py Fri Nov 21 08:17:36 2014 -0500 @@ -91,6 +91,7 @@ #between records (starting with hash). pass else: + handle.close() raise ValueError("Bad FASTA line %r" % line) handle.close() if title:
--- a/tools/protein_analysis/signalp3.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/signalp3.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,12 +1,10 @@ -<tool id="signalp3" name="SignalP 3.0" version="0.0.12"> +<tool id="signalp3" name="SignalP 3.0" version="0.0.14"> <description>Find signal peptides in protein sequences</description> <!-- If job splitting is enabled, break up the query file into parts --> <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal --> <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> <command interpreter="python"> - signalp3.py $organism $truncate "\$NSLOTS" $fasta_file $tabular_file - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + signalp3.py $organism $truncate "\$GALAXY_SLOTS" $fasta_file $tabular_file ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. </command> @@ -197,4 +195,10 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp </help> + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <citation type="doi">10.1016/j.jmb.2004.05.028</citation> + <citation type="doi">10.1093/protein/10.1.1</citation> + <!-- TODO - Add bibtex entry for PMID: 9783217 --> + </citations> </tool>
--- a/tools/protein_analysis/suite_config.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/suite_config.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,15 +1,21 @@ - <suite id="tmhmm_and_signalp" name="Protein sequence analysis tools" version="0.0.9"> + <suite id="tmhmm_and_signalp" name="Protein/gene sequence analysis tools" version="0.2.6"> <description>TMHMM, SignalP, RXLR motifs, WoLF PSORT</description> - <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.7"> + <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.12"> <description>Find transmembrane domains in protein sequences</description> </tool> - <tool id="signalp3" name="SignalP 3.0" version="0.0.8"> + <tool id="signalp3" name="SignalP 3.0" version="0.0.13"> <description>Find signal peptides in protein sequences</description> </tool> - <tool id="wolf_psort" name="WoLF PSORT" version="0.0.1"> + <tool id="promoter2" name="Promoter 2.0" version="0.0.7"> + <description>Find eukaryotic PolII promoters in DNA sequences</description> + </tool> + <tool id="psortb" name="PSORTb" version="0.0.4"> + <description>Bacteria/archaea protein subcellular localization prediction</description> + </tool> + <tool id="wolf_psort" name="WoLF PSORT" version="0.0.7"> <description>Eukaryote protein subcellular localization prediction</description> </tool> - <tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.5"> + <tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.8"> <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description> </tool> </suite>
--- a/tools/protein_analysis/tmhmm2.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/tmhmm2.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,13 +1,10 @@ -<tool id="tmhmm2" name="TMHMM 2.0" version="0.0.11"> +<tool id="tmhmm2" name="TMHMM 2.0" version="0.0.13"> <description>Find transmembrane domains in protein sequences</description> <!-- If job splitting is enabled, break up the query file into parts --> <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal --> <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> <command interpreter="python"> - tmhmm2.py "\$NSLOTS" $fasta_file $tabular_file - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + tmhmm2.py "\$GALAXY_SLOTS" $fasta_file $tabular_file ##If the environment variable isn't set, get "", and the python wrapper ##defaults to four threads. </command> @@ -119,4 +116,9 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp </help> + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <citation type="doi">10.1006/jmbi.2000.4315</citation> + <!-- TODO - add entry for PMID: 9783223 --> + </citations> </tool>
--- a/tools/protein_analysis/wolf_psort.xml Tue Sep 17 12:06:15 2013 -0400 +++ b/tools/protein_analysis/wolf_psort.xml Fri Nov 21 08:17:36 2014 -0500 @@ -1,10 +1,7 @@ -<tool id="wolf_psort" name="WoLF PSORT" version="0.0.6"> +<tool id="wolf_psort" name="WoLF PSORT" version="0.0.8"> <description>Eukaryote protein subcellular localization prediction</description> <command interpreter="python"> - wolf_psort.py $organism "\$NSLOTS" "$fasta_file" "$tabular_file" - ##I want the number of threads to be a Galaxy config option... - ##Set the number of threads in the runner entry in universe_wsgi.ini - ##which (on SGE at least) will set the $NSLOTS environment variable. + wolf_psort.py $organism "\$GALAXY_SLOTS" "$fasta_file" "$tabular_file" ##If the environment variable isn't set, get "", and python wrapper ##defaults to four threads. </command> @@ -150,4 +147,8 @@ This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp </help> + <citations> + <citation type="doi">10.7717/peerj.167</citation> + <citation type="doi">10.1093/nar/gkm259</citation> + </citations> </tool>
