comparison readme.rst @ 0:ba161910b46f draft

Uploaded
author rnateam
date Mon, 21 Oct 2013 12:27:17 -0400
parents
children d6553277b759
comparison
equal deleted inserted replaced
-1:000000000000 0:ba161910b46f
1 This package is a Galaxy workflow for BlockClust pipeline.
2
3 It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of
4 genes to generate gene predictions on a new genome, and then calls EMBOSS
5 (Rice et al. 2000) to translate the predictions into a FASTA file of
6 predicted protein sequences. The workflow requires two input files:
7
8 * Nucleotide FASTA file of know gene sequences (training set)
9 * Nucleotide FASTA file of genome sequence or assembled contigs
10
11 First an interpolated context model (ICM) is built from the set of known
12 genes, preferably from the closest relative organism(s) available. Next this
13 ICM model is used to predict genes on the genomic FASTA file. This produces
14 a FASTA file of the predicted gene nucleotide sequences, which is translated
15 into protein sequences using the EMBOSS tool transeq.
16
17 Glimmer is intended for finding genes in microbial DNA, especially bacteria,
18 archaea, and viruses.
19
20 See http://www.galaxyproject.org for information about the Galaxy Project.
21
22
23 Sample Data
24 ===========
25
26 As an example, we will use the first public assembly of the 2011 Shiga-toxin
27 producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the
28 open-source crowd-sourcing analysis described in Rohde et al. (2011) and here:
29 https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
30
31 You can upload this assembly directly into Galaxy using the "Upload File" tool
32 with either of these URLs - Galaxy should recognise this is a FASTA file with
33 3,057 sequences:
34
35 * http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt
36 * https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt
37
38 This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled
39 by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the
40 MIRA 3.2 assembler. It was initially released via his blog,
41 http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/
42
43 We will also need a training set of known *E. coli* genes, for example the
44 model strain *Escherichia coli* str. K-12 substr. MG1655 which is well
45 annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the
46 gene nucleotide sequences directly into Galaxy via this URL, which Galaxy
47 should recognise as a FASTA file with 4,321 sequences:
48
49 * ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn
50
51 Then run the workflow, which should produce 2,333 predicted genes for the
52 TY2482 assembly (two FASTA files, nucleotide and protein sequences).
53
54
55 Citation
56 ========
57
58 If you use this workflow directly, or a derivative of it, or the associated
59 wrappers for Galaxy, in work leading to a scientific publication,
60 please cite:
61
62 P. Videm at al...
63
64 For Glimmer3 please cite:
65
66 Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007)
67 Identifying bacterial genes and endosymbiont DNA with Glimmer.
68 Bioinformatics 23(6), 673-679.
69 http://dx.doi.org/10.1093/bioinformatics/btm009
70
71 For EMBOSS please cite:
72
73 Rice, P., Longden, I. and Bleasby, A. (2000)
74 EMBOSS: The European Molecular Biology Open Software Suite
75 Trends in Genetics 16(6), 276-277.
76 http://dx.doi.org/10.1016/S0168-9525(00)02024-2
77
78
79 Additional References
80 =====================
81
82 Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011)
83 Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4.
84 New England Journal of Medicine 365, 718-724.
85 http://dx.doi.org/10.1056/NEJMoa1107643
86
87
88 Availability
89 ============
90
91 This workflow is available on the main Galaxy Tool Shed:
92
93 http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow
94
95 Development is being done on github:
96
97 https://github.com/bgruening/galaxytools/workflows/glimmer3/
98
99
100 Dependencies
101 ============
102
103 These dependencies should be resolved automatically via the Galaxy Tool Shed:
104
105 * http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3
106 * http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5