comparison transit_resampling.xml @ 4:7288ac4e8583 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/transit/ commit b8111d3ebede6ee71b18fbbabc708cdeab340912-dirty
author dave
date Wed, 03 Apr 2019 14:43:41 -0400
parents b33af081b02e
children 3fcb70c1ca78
comparison
equal deleted inserted replaced
3:b33af081b02e 4:7288ac4e8583
39 <param name="inputs" ftype="wig" value="transit-in1-rep1.wig,transit-in1-rep2.wig" /> 39 <param name="inputs" ftype="wig" value="transit-in1-rep1.wig,transit-in1-rep2.wig" />
40 <param name="controls" ftype="wig" value="transit-co1-rep1.wig,transit-co1-rep2.wig,transit-co1-rep3.wig" /> 40 <param name="controls" ftype="wig" value="transit-co1-rep1.wig,transit-co1-rep2.wig,transit-co1-rep3.wig" />
41 <param name="annotation" ftype="gff3" value="transit-in1.gff3" /> 41 <param name="annotation" ftype="gff3" value="transit-in1.gff3" />
42 <param name="samples" value="1000" /> 42 <param name="samples" value="1000" />
43 <param name="burnin" value="100" /> 43 <param name="burnin" value="100" />
44 <param name="replicates" value="Replicates" />
44 <output name="sites" file="resampling-sites1.txt" ftype="tabular" compare="sim_size" /> 45 <output name="sites" file="resampling-sites1.txt" ftype="tabular" compare="sim_size" />
45 </test> 46 </test>
46 </tests> 47 </tests>
47 48
48 <help> 49 <help><![CDATA[
49 <![CDATA[.. class:: infomark
50
51 **What it does**
52
53 -------------------
54 50
55 51
56 The re-sampling method is a comparative analysis the allows that can be used to determine conditional essentiality of genes. It is based on a permutation test, and is capable of determining read-counts that are significantly different across conditions. 52 .. class:: infomark
57 53
58 This technique has yet to be formally published in the context of differential essentiality analysis. Briefly, the read-counts at each genes are determined for each replicate of each condition. The total read-counts in condition A is subtracted from the total read counts at condition B, to obtain an observed difference in read counts. The TA sites are then permuted for a given number of “samples”. For each one of these permutations, the difference is read-counts is determined. This forms a null distribution, from which a p-value is calculated for the original, observed difference in read-counts. 54 **What it does**
59 55
60 56 -------------------
61 Note : Can be used for both Himar1 and Tn5 datasets
62
63
64 -------------------
65
66 **Inputs**
67
68 -------------------
69
70 Input files for Resampling need to be:
71
72 - .wig files : Tabulated files containing one column with the TA site coordinate and one column with the read count at this site.
73 - annotation .prot_table : Annotation file generated by the `Convert Gff3 to prot_table for TRANSIT` tool.
74
75
76 -------------------
77
78 **Parameters**
79
80 -------------------
81
82 Optional Arguments:
83
84 -s <integer> := Number of samples. Default: 10000
85 -n <string> := Normalization method. Default: TTR
86 -h := Output histogram of the permutations for each gene. Default: Off.
87 -a := Perform adaptive resampling. Default: Off.
88 -ez := Exclude rows with zero accross conditions. Default: Off
89 --pc := Pseudocounts to be added at each site.:
90 -l := Perform LOESS Correction; Helps remove possible genomic position bias. Default: Off.
91 --iN <float> := Ignore TAs occuring at given fraction of the N terminus. Default: 0.0
92 --iC <float> := Ignore TAs occuring at given fraction of the C terminus. Default: 0.0
93 --ctrl_lib := String of letters representing library of control files in order e.g. 'AABB': Default empty. Letters used must also be used in --exp_lib. If non-empty, resampling will limit permutations to within-libraries.
94 --exp_lib := String of letters representing library of experimental files in order e.g. 'ABAB': Default empty. Letters used must also be used in --ctrl_lib. If non-empty, resampling will limit permutations to within-libraries.
95
96
97 The resampling method is non-parametric, and therefore does not require any parameters governing the distributions or the model. The following parameters are available for the method:
98
99 - Samples: The number of samples (permutations) to perform. The larger the number of samples, the more resolution the p-values calculated will have, at the expense of longer computation time. The re-sampling method runs on 10,000 samples by default.
100 - Output Histograms:Determines whether to output .png images of the histograms obtained from resampling the difference in read-counts.
101 - Adaptive Resampling: An optional “adaptive” version of resampling which accelerates the calculation by terminating early for genes which are likely not significant. This dramatically speeds up the computation at the cost of less accurate estimates for those genes that terminate early (i.e. deemed not significant). This option is OFF by default.
102 - Include Zeros: Select to include sites that are zero. This is the preferred behavior, however, unselecting this (thus ignoring sites that) are zero accross all dataset (i.e. completely empty), is useful for decreasing running time (specially for large datasets like Tn5).
103 - Normalization Method: Determines which normalization method to use when comparing datasets. Proper normalization is important as it ensures that other sources of variability are not mistakenly treated as real differences. See the Normalization section for a description of normalization method available in TRANSIT.
104
105
106 -------------------
107
108 **Outputs**
109
110 -------------------
111
112 The re-sampling method outputs a tab-delimited file with results for each gene in the genome. P-values are adjusted for multiple comparisons using the Benjamini-Hochberg procedure (called “q-values” or “p-adj.”). A typical threshold for conditional essentiality on is q-value < 0.05.
113
114
115 ============================================= ========================================================================================================================
116 **Column Header** **Column Definition**
117 --------------------------------------------- ------------------------------------------------------------------------------------------------------------------------
118 Orf Gene ID
119 Name Gene Name
120 Desc Gene Description
121 N Number of TA sites in the gene.
122 TAs Hit Number of TA sites with at least one insertion.
123 Sum Rd 1 Sum of read counts in condition 1.
124 Sum Rd 2 Sum of read counts in condition 2.
125 Delta Rd Difference in the sum of read counts.
126 p-value P-value calculated by the permutation test.
127 p-adj. Adjusted p-value controlling for the FDR (Benjamini-Hochberg)
128 ============================================= ========================================================================================================================
129 57
130 58
131 59
132 ------------------- 60 The re-sampling method is a comparative analysis the allows that can be used to determine conditional essentiality of genes. It is based on a permutation test, and is capable of determining read-counts that are significantly different across conditions.
133 61
134 **More Information** 62 This technique has yet to be formally published in the context of differential essentiality analysis. Briefly, the read-counts at each genes are determined for each replicate of each condition. The total read-counts in condition A is subtracted from the total read counts at condition B, to obtain an observed difference in read counts. The TA sites are then permuted for a given number of “samples”. For each one of these permutations, the difference is read-counts is determined. This forms a null distribution, from which a p-value is calculated for the original, observed difference in read-counts.
135 63
136 -------------------
137 64
138 See `TRANSIT documentation` 65 Note : Can be used for both Himar1 and Tn5 datasets
139 66
140 - TRANSIT: https://transit.readthedocs.io/en/latest/index.html 67
141 - `TRANSIT Gumbel`: https://transit.readthedocs.io/en/latest/transit_methods.html#re-sampling 68
142 ]]></help> 69 -------------------
70
71 **Inputs**
72
73 -------------------
74
75 Input files for Resampling need to be:
76
77 - .wig files : Tabulated files containing one column with the TA site coordinate and one column with the read count at this site.
78 - annotation .prot_table : Annotation file generated by the `Convert Gff3 to prot_table for TRANSIT` tool.
79
80
81 -------------------
82
83 **Parameters**
84
85 -------------------
86
87 Optional Arguments:
88
89 -s <integer> := Number of samples. Default: -s 10000
90 -n <string> := Normalization method. Default: -n TTR
91 -h := Output histogram of the permutations for each gene. Default: Turned Off.
92 -a := Perform adaptive resampling. Default: Turned Off.
93 -ez := Exclude rows with zero accross conditions. Default: Turned off
94
95 --pc := Pseudocounts to be added at each site.
96 -l := Perform LOESS Correction; Helps remove possible genomic position bias.
97 Default: Turned Off.
98 --iN <float> := Ignore TAs occuring at given fraction of the N terminus. Default: -iN 0.0
99 --iC <float> := Ignore TAs occuring at given fraction of the C terminus. Default: -iC 0.0
100 --ctrl_lib := String of letters representing library of control files in order
101 e.g. 'AABB'. Default empty. Letters used must also be used in --exp_lib
102 If non-empty, resampling will limit permutations to within-libraries.
103 --exp_lib := String of letters representing library of experimental files in order
104 e.g. 'ABAB'. Default empty. Letters used must also be used in --ctrl_lib
105 If non-empty, resampling will limit permutations to within-libraries.
106
107
108 The resampling method is non-parametric, and therefore does not require any parameters governing the distributions or the model. The following parameters are available for the method:
109
110 - Samples: The number of samples (permutations) to perform. The larger the number of samples, the more resolution the p-values calculated will have, at the expense of longer computation time. The re-sampling method runs on 10,000 samples by default.
111 - Output Histograms:Determines whether to output .png images of the histograms obtained from resampling the difference in read-counts.
112 - Adaptive Resampling: An optional “adaptive” version of resampling which accelerates the calculation by terminating early for genes which are likely not significant. This dramatically speeds up the computation at the cost of less accurate estimates for those genes that terminate early (i.e. deemed not significant). This option is OFF by default.
113 - Include Zeros: Select to include sites that are zero. This is the preferred behavior, however, unselecting this (thus ignoring sites that) are zero accross all dataset (i.e. completely empty), is useful for decreasing running time (specially for large datasets like Tn5).
114 - Normalization Method: Determines which normalization method to use when comparing datasets. Proper normalization is important as it ensures that other sources of variability are not mistakenly treated as real differences. See the Normalization section for a description of normalization method available in TRANSIT.
115
116
117 -------------------
118
119 **Outputs**
120
121 -------------------
122
123 The re-sampling method outputs a tab-delimited file with results for each gene in the genome. P-values are adjusted for multiple comparisons using the Benjamini-Hochberg procedure (called “q-values” or “p-adj.”). A typical threshold for conditional essentiality on is q-value < 0.05.
124
125 ============================================= ========================================================================================================================
126 **Column Header** **Column Definition**
127 --------------------------------------------- ------------------------------------------------------------------------------------------------------------------------
128 Orf Gene ID
129 Name Gene Name
130 Desc Gene Description
131 N Number of TA sites in the gene.
132 TAs Hit Number of TA sites with at least one insertion.
133 Sum Rd 1 Sum of read counts in condition 1.
134 Sum Rd 2 Sum of read counts in condition 2.
135 Delta Rd Difference in the sum of read counts.
136 p-value P-value calculated by the permutation test.
137 p-adj. Adjusted p-value controlling for the FDR (Benjamini-Hochberg)
138 ============================================= ========================================================================================================================
139
140
141
142 -------------------
143
144 **More Information**
145
146 -------------------
147
148 See `TRANSIT documentation`
149
150 - TRANSIT: https://transit.readthedocs.io/en/latest/index.html
151 - `TRANSIT Gumbel`: https://transit.readthedocs.io/en/latest/transit_methods.html#re-sampling
152 ]]></help>
143 153
144 <expand macro="citations" /> 154 <expand macro="citations" />
145 </tool> 155 </tool>