comparison snpSift_dbnsfp.xml @ 277:6c1a0c6cf28c draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tool_collections/snpsift/snpsift_dbnsfp commit d12a2e9dd273b4c23db48bbb747f32700887710e
author iuc
date Tue, 07 Jun 2016 09:41:50 -0400
parents
children 02f54bf45aaf
comparison
equal deleted inserted replaced
276:417d227c5fef 277:6c1a0c6cf28c
1 <tool id="snpSift_dbnsfp" name="SnpSift dbNSFP" version="@WRAPPER_VERSION@.0">
2 <description>Add Annotations from dbNSFP or similar annotation DBs</description>
3 <macros>
4 <import>snpSift_macros.xml</import>
5 </macros>
6 <expand macro="requirements" />
7 <expand macro="stdio" />
8 <expand macro="version_command" />
9 <command><![CDATA[
10 java -Xmx6G -jar "\$SNPEFF_JAR_PATH/SnpSift.jar" dbnsfp -v
11 #if $db.dbsrc == 'cached':
12 -db "$db.dbnsfp"
13 #if $db.annotations and str($db.annotations) != '':
14 -f "$db.annotations"
15 #end if
16 #else:
17 -db "${db.dbnsfpdb.extra_files_path}/${db.dbnsfpdb.metadata.bgzip}"
18 #if $db.annotations and str($db.annotations) != '':
19 -f "$db.annotations"
20 #end if
21 #end if
22 "$input" > "$output"
23 2> tmp.err && grep -v file tmp.err
24 ]]>
25 </command>
26 <inputs>
27 <param name="input" type="data" format="vcf" label="Variant input file in VCF format"/>
28 <conditional name="db">
29 <param name="dbsrc" type="select" label="dbNSFP ">
30 <option value="cached">Locally installed dbNSFP database </option>
31 <option value="history">dbNSFP database from your history</option>
32 </param>
33 <when value="cached">
34 <param name="dbnsfp" type="select" label="Genome">
35 <options from_data_table="snpsift_dbnsfps">
36 <column name="name" index="2"/>
37 <column name="value" index="3"/>
38 </options>
39 </param>
40 <param name="annotations" type="select" multiple="true" display="checkboxes" label="Annotate with">
41 <options from_data_table="snpsift_dbnsfps">
42 <column name="name" index="4"/>
43 <column name="value" index="4"/>
44 <filter type="param_value" ref="dbnsfp" column="3" />
45 <filter type="multiple_splitter" column="4" separator=","/>
46 </options>
47 </param>
48 </when>
49 <when value="history">
50 <param name="dbnsfpdb" type="data" format="snpsiftdbnsfp" label="DbNSFP"/>
51 <param name="annotations" type="select" multiple="true" display="checkboxes" label="Annotate with">
52 <options>
53 <filter type="data_meta" ref="dbnsfpdb" key="annotation" />
54 </options>
55 </param>
56 </when>
57 </conditional>
58 </inputs>
59 <outputs>
60 <data format="vcf" name="output" />
61 </outputs>
62 <tests>
63 <test>
64 <param name="input" ftype="vcf" value="test_annotate_in.vcf.vcf"/>
65 <param name="dbsrc" value="history"/>
66 <param name="dbnsfpdb" value="test_dbnsfpdb.tabular" ftype="dbnsfp.tabular" />
67 <annotations value="aaref,aaalt,genename,aapos,SIFT_score"/>
68 <output name="output">
69 <assert_contents>
70 <has_text text="dbNSFP_SIFT_score=0.15" />
71 </assert_contents>
72 </output>
73 </test>
74 </tests>
75 <help><![CDATA[
76
77 The dbNSFP is an integrated database of functional predictions from multiple algorithms (SIFT, Polyphen2, LRT and MutationTaster, PhyloP and GERP++, etc.).
78 It contains variant annotations such as:
79
80
81 1000Gp1_AC
82 Alternative allele counts in the whole 1000 genomes phase 1 (1000Gp1) data
83 1000Gp1_AF
84 Alternative allele frequency in the whole 1000Gp1 data
85 1000Gp1_AFR_AC
86 Alternative allele counts in the 1000Gp1 African descendent samples
87 1000Gp1_AFR_AF
88 Alternative allele frequency in the 1000Gp1 African descendent samples
89 1000Gp1_AMR_AC
90 Alternative allele counts in the 1000Gp1 American descendent samples
91 1000Gp1_AMR_AF
92 Alternative allele frequency in the 1000Gp1 American descendent samples
93 1000Gp1_ASN_AC
94 Alternative allele counts in the 1000Gp1 Asian descendent samples
95 1000Gp1_ASN_AF
96 Alternative allele frequency in the 1000Gp1 Asian descendent samples
97 1000Gp1_EUR_AC
98 Alternative allele counts in the 1000Gp1 European descendent samples
99 1000Gp1_EUR_AF
100 Alternative allele frequency in the 1000Gp1 European descendent samples
101 aaalt
102 Alternative amino acid. "." if the variant is a splicing site SNP (2bp on each end of an intron)
103 aapos
104 Amino acid position as to the protein. "-1" if the variant is a splicing site SNP (2bp on each end of an intron)
105 aapos_SIFT
106 ENSP id and amino acid positions corresponding to SIFT scores. Multiple entries separated by ";"
107 aapos_FATHMM
108 ENSP id and amino acid positions corresponding to FATHMM scores. Multiple entries separated by ";"
109 aaref
110 Reference amino acid. "." if the variant is a splicing site SNP (2bp on each end of an intron)
111 alt
112 Alternative nucleotide allele (as on the + strand)
113 Ancestral_allele
114 Ancestral allele (based on 1000 genomes reference data)
115 cds_strand
116 Coding sequence (CDS) strand (+ or -)
117 chr
118 Chromosome number
119 codonpos
120 Position on the codon (1, 2 or 3)
121 Ensembl_geneid
122 Ensembl gene ID
123 Ensembl_transcriptid
124 Ensembl transcript IDs (separated by ";")
125 ESP6500_AA_AF
126 Alternative allele frequency in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set)
127 ESP6500_EA_AF
128 Alternative allele frequency in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set)
129 FATHMM_pred
130 If a FATHMM_score is <=-1.5 (or rankscore <=0.81415) the corresponding non-synonymous SNP is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". Multiple predictions separated by ";"
131 FATHMM_rankscore
132 FATHMMori scores were ranked among all FATHMMori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of FATHMMori scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1
133 FATHMM_score
134 FATHMM default score (FATHMMori)
135 fold-degenerate
136 Degenerate type (0, 2 or 3)
137 genename
138 Gene name; if the non-synonymous SNP can be assigned to multiple genes, gene names are separated by ";"
139 GERP++_NR
140 GERP++ neutral rate
141 GERP++_RS
142 GERP++ RS score, the larger the score, the more conserved the site
143 GERP++_RS_rankscore
144 GERP++ RS scores were ranked among all GERP++ RS scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GERP++ RS scores in dbNSFP
145 hg18_pos(1-coor)
146 Physical position on the chromosome as to hg18 (1-based coordinate)
147 Interpro_domain
148 Domain or conserved site on which the variant locates
149 LR_pred
150 Prediction of our LR based ensemble prediction score, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. The rankscore cutoff between "D" and "T" is 0.82268
151 LR_rankscore
152 LR scores were ranked among all LR scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of LR scores in dbNSFP. The scores range from 0 to 1
153 LR_score
154 Our logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1
155 LRT_Omega
156 Estimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)
157 LRT_converted_rankscore
158 LRTori scores were first converted as LRTnew=1-LRTori*0.5 if Omega<1, or LRTnew=LRTori*0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00166 to 0.85682
159 LRT_pred
160 LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score
161 LRT_score
162 The original LRT two-sided p-value (LRTori), ranges from 0 to 1
163 MutationAssessor_pred
164 MutationAssessor's functional impact of a variant
165 MutationAssessor_rankscore
166 MAori scores were ranked among all MAori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MAori scores in dbNSFP. The scores range from 0 to 1
167 MutationAssessor_score
168 MutationAssessor functional impact combined score (MAori)
169 MutationTaster_converted_rankscore
170 The MTori scores were first converted: if the prediction is "A" or "D" MTnew=MTori; if the prediction is "N" or "P", MTnew=1-MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.0931 to 0.80722
171 MutationTaster_pred
172 MutationTaster prediction
173 MutationTaster_score
174 MutationTaster p-value (MTori), ranges from 0 to 1
175 phastCons46way_placental
176 phastCons conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
177 phastCons46way_placental_rankscore
178 phastCons46way_placental scores were ranked among all phastCons46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_placental scores in dbNSFP
179 phastCons46way_primate
180 phastCons conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
181 phastCons46way_primate_rankscore
182 phastCons46way_primate scores were ranked among all phastCons46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_primate scores in dbNSFP
183 phastCons100way_vertebrate
184 phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
185 phastCons100way_vertebrate_rankscore
186 phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP
187 phyloP46way_placental
188 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
189 phyloP46way_placental_rankscore
190 phyloP46way_placental scores were ranked among all phyloP46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_placental scores in dbNSFP
191 phyloP46way_primate
192 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
193 phyloP46way_primate_rankscore
194 phyloP46way_primate scores were ranked among all phyloP46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_primate scores in dbNSFP
195 phyloP100way_vertebrate
196 phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
197 phyloP100way_vertebrate_rankscore
198 phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP
199 Polyphen2_HDIV_pred
200 Polyphen2 prediction based on HumDiv
201 Polyphen2_HDIV_rankscore
202 Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.02656 to 0.89917
203 Polyphen2_HDIV_score
204 Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
205 Polyphen2_HVAR_pred
206 Polyphen2 prediction based on HumVar
207 Polyphen2_HVAR_rankscore
208 Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01281 to 0.9711
209 Polyphen2_HVAR_score
210 Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
211 pos(1-coor)
212 Physical position on the chromosome as to hg19 (1-based coordinate)
213 RadialSVM_pred
214 Prediction of our SVM based ensemble prediction score, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.83357
215 RadialSVM_rankscore
216 RadialSVM scores were ranked among all RadialSVM scores in dbNSFP. The rankscore is the ratio of the rank of the screo over the total number of RadialSVM scores in dbNSFP. The scores range from 0 to 1
217 RadialSVM_score
218 Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP
219 ref
220 Reference nucleotide allele (as on the + strand)
221 refcodon
222 Reference codon
223 Reliability_index
224 Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for RadialSVM and LR. Ranges from 1 to 10. As RadialSVM and LR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions
225 SIFT_converted_rankscore
226 SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.02654 to 0.87932
227 SIFT_pred
228 If SIFTori is smaller than 0.05 (rankscore>0.55) the corresponding non-synonymous SNP is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";"
229 SIFT_score
230 SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";"
231 SiPhy_29way_logOdds
232 SiPhy score based on 29 mammals genomes. The larger the score, the more conserved the site
233 SiPhy_29way_pi
234 The estimated stationary distribution of A, C, G and T at the site, using SiPhy algorithm based on 29 mammals genomes
235 SLR_test_statistic
236 SLR test statistic for testing natural selection on codons. A negative value indicates negative selection, and a positive value indicates positive selection. Larger magnitude of the value suggests stronger evidence
237 Uniprot_aapos
238 Amino acid position as to Uniprot. Multiple entries separated by ";"
239 Uniprot_acc
240 Uniprot accession number. Multiple entries separated by ";"
241 Uniprot_id
242 Uniprot ID number. Multiple entries separated by ";"
243 UniSNP_ids
244 rs numbers from UniSNP, which is a cleaned version of dbSNP build 129, in format: rs number1;rs number2;...
245
246
247 The website for dbNSFP database is https://sites.google.com/site/jpopgen/dbNSFP and there is only annotation for human genome builds.
248
249 The procedure for preparing the dbNSFP data for use in SnpSift dbnsfp is in the SnpSift documentation:
250 *( It also provides links for dbNSFP databases prebuilt for SnpSift )*
251 http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
252
253 However, any dbNSFP-like tabular file that be can used with SnpSift dbnsfp if it has::
254
255 - The first line of the file must be column headers that name the annotations.
256 - The first 4 columns are required and must be::
257 1. chromosome
258 2. position in chromosome
259 3. reference base
260 4. alternate base
261
262 For example:
263
264 ::
265
266 #chr pos(1-coor) ref alt aaref aaalt genename SIFT_score
267 1 69134 A C E A OR4F5 0.03
268 1 69134 A G E G OR4F5 0.09
269 1 69134 A T E V OR4F5 0.03
270 4 100239319 T A H L ADH1B 0
271 4 100239319 T C H R ADH1B 0.15
272 4 100239319 T G H P ADH1B 0
273
274
275 The galaxy datatypes for dbNSFP can automatically convert the specially formatted tabular file for use by SnpSift dbNSFP:
276 1. Upload the tabular file, set the datatype as: **"dbnsfp.tabular"**
277 2. Edit the history dataset attributes (pencil icon): Use "Convert Format" to convert the **"dbnsfp.tabular"** to the correct format for SnpSift dbnsfp: **"snpsiftdbnsfp"**.
278
279
280 @EXTERNAL_DOCUMENTATION@
281 http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
282
283 @CITATION_SECTION@
284
285 ]]>
286 </help>
287 <expand macro="citations">
288 <citation type="doi">DOI: 10.1002/humu.21517</citation>
289 <citation type="doi">DOI: 10.1002/humu.22376</citation>
290 <citation type="doi">DOI: 10.1002/humu.22932</citation>
291 <citation type="doi">doi: 10.1093/hmg/ddu733</citation>
292 <citation type="doi">doi: 10.1093/nar/gku1206</citation>
293 <citation type="doi">doi: 10.3389/fgene.2012.00035</citation>
294 </expand>
295 </tool>