Mercurial > repos > rnateam > blockclust_workflow
comparison readme.rst @ 0:ba161910b46f draft
Uploaded
author | rnateam |
---|---|
date | Mon, 21 Oct 2013 12:27:17 -0400 |
parents | |
children | d6553277b759 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:ba161910b46f |
---|---|
1 This package is a Galaxy workflow for BlockClust pipeline. | |
2 | |
3 It uses the Glimmer3 tool (Delcher et al. 2007) trained on a known set of | |
4 genes to generate gene predictions on a new genome, and then calls EMBOSS | |
5 (Rice et al. 2000) to translate the predictions into a FASTA file of | |
6 predicted protein sequences. The workflow requires two input files: | |
7 | |
8 * Nucleotide FASTA file of know gene sequences (training set) | |
9 * Nucleotide FASTA file of genome sequence or assembled contigs | |
10 | |
11 First an interpolated context model (ICM) is built from the set of known | |
12 genes, preferably from the closest relative organism(s) available. Next this | |
13 ICM model is used to predict genes on the genomic FASTA file. This produces | |
14 a FASTA file of the predicted gene nucleotide sequences, which is translated | |
15 into protein sequences using the EMBOSS tool transeq. | |
16 | |
17 Glimmer is intended for finding genes in microbial DNA, especially bacteria, | |
18 archaea, and viruses. | |
19 | |
20 See http://www.galaxyproject.org for information about the Galaxy Project. | |
21 | |
22 | |
23 Sample Data | |
24 =========== | |
25 | |
26 As an example, we will use the first public assembly of the 2011 Shiga-toxin | |
27 producing *Escherichia coli* O104:H4 outbreak in Germany. This was part of the | |
28 open-source crowd-sourcing analysis described in Rohde et al. (2011) and here: | |
29 https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki | |
30 | |
31 You can upload this assembly directly into Galaxy using the "Upload File" tool | |
32 with either of these URLs - Galaxy should recognise this is a FASTA file with | |
33 3,057 sequences: | |
34 | |
35 * http://static.xbase.ac.uk/files/results/nick/TY2482/TY2482.fasta.txt | |
36 * https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/blob/master/strains/TY2482/seqProject/BGI/assemblies/NickLoman/TY2482.fasta.txt | |
37 | |
38 This FASTA file ``TY2482.fasta.txt`` was the initial TY-2482 strain assembled | |
39 by Nick Loman from 5 runs of Ion Torrent data released by the BGI, using the | |
40 MIRA 3.2 assembler. It was initially released via his blog, | |
41 http://pathogenomics.bham.ac.uk/blog/2011/06/ehec-genome-assembly/ | |
42 | |
43 We will also need a training set of known *E. coli* genes, for example the | |
44 model strain *Escherichia coli* str. K-12 substr. MG1655 which is well | |
45 annotated. You can upload the NCBI FASTA file ``NC_000913.ffn`` of the | |
46 gene nucleotide sequences directly into Galaxy via this URL, which Galaxy | |
47 should recognise as a FASTA file with 4,321 sequences: | |
48 | |
49 * ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.ffn | |
50 | |
51 Then run the workflow, which should produce 2,333 predicted genes for the | |
52 TY2482 assembly (two FASTA files, nucleotide and protein sequences). | |
53 | |
54 | |
55 Citation | |
56 ======== | |
57 | |
58 If you use this workflow directly, or a derivative of it, or the associated | |
59 wrappers for Galaxy, in work leading to a scientific publication, | |
60 please cite: | |
61 | |
62 P. Videm at al... | |
63 | |
64 For Glimmer3 please cite: | |
65 | |
66 Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007) | |
67 Identifying bacterial genes and endosymbiont DNA with Glimmer. | |
68 Bioinformatics 23(6), 673-679. | |
69 http://dx.doi.org/10.1093/bioinformatics/btm009 | |
70 | |
71 For EMBOSS please cite: | |
72 | |
73 Rice, P., Longden, I. and Bleasby, A. (2000) | |
74 EMBOSS: The European Molecular Biology Open Software Suite | |
75 Trends in Genetics 16(6), 276-277. | |
76 http://dx.doi.org/10.1016/S0168-9525(00)02024-2 | |
77 | |
78 | |
79 Additional References | |
80 ===================== | |
81 | |
82 Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N.J., et al. (2011) | |
83 Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4. | |
84 New England Journal of Medicine 365, 718-724. | |
85 http://dx.doi.org/10.1056/NEJMoa1107643 | |
86 | |
87 | |
88 Availability | |
89 ============ | |
90 | |
91 This workflow is available on the main Galaxy Tool Shed: | |
92 | |
93 http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer_gene_calling_workflow | |
94 | |
95 Development is being done on github: | |
96 | |
97 https://github.com/bgruening/galaxytools/workflows/glimmer3/ | |
98 | |
99 | |
100 Dependencies | |
101 ============ | |
102 | |
103 These dependencies should be resolved automatically via the Galaxy Tool Shed: | |
104 | |
105 * http://toolshed.g2.bx.psu.edu/view/bgruening/glimmer3 | |
106 * http://toolshed.g2.bx.psu.edu/view/devteam/emboss_5 |