comparison ideas.xml @ 94:7d9af0d824ad draft

Uploaded
author greg
date Tue, 05 Sep 2017 08:38:49 -0400
parents 0c2cf49dfb58
children ff4d84a01fa7
comparison
equal deleted inserted replaced
93:0c2cf49dfb58 94:7d9af0d824ad
235 </test> 235 </test>
236 </tests> 236 </tests>
237 <help> 237 <help>
238 **What it does** 238 **What it does**
239 239
240 Employs the IDEAS (Integrative and Discriminative Epigenome Annotation System) method for jointly and quantitatively characterizing 240 IDEAS (an **I**ntegrative and **D**iscriminative **E**pigenome **A**nnotation **S**ystem) identifies de novo
241 multivariate epigenetic landscapes in many cell types, tissues or conditions. The method accounts for position dependent epigenetic 241 regulatory functions from epigenetic data in multiple cell types jointly. It is a full probabilistic model
242 events and detects local cell type relationships, which not only help to improve the accuracy of annotating functional classes of DNA 242 defined on all data, and it combines signals across both the genome and cell types to boost power. The underlying
243 sequences, but also reveal cell type constitutive and specific loci. The method utilizes Bayesian non-parametric techniques to automatically 243 assumption of IDEAS is that, because all cell types share the same underlying DNA sequences, **functions of each
244 identify the best model size fitting to the data so users do not have to specify the number of states. On the other hand, users can 244 DNA segment should be correlated**. Also, cell type specific regulation is locus-dependent, and thus IDEAS uses
245 still specify the number of states if desired. 245 local epigenetic landscape to **identify de novo and local cell type clusters** without assuming or requiring a
246 known global cell type relationship.
247
248 IDEAS takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc)
249 or any other whole-genome data sets (e.g., scores). Currently the supported data formats include BigWig and BAM.
250 All data sets will first be mapped by IDEAS to a common genomic coordinate in a selected assembly (200bp windows
251 by default, or user-provided). The user can specify regions to be considered or removed from the analysis. The
252 input data may come from one cell type/condition/individual/time point (although it does not fully utilize the
253 advantage of IDEAS), or from multiple cell types/conditions/individuals/time points. The same set of epigenetic
254 features may not be present in all cell types, for which IDEAS will do imputation of the missing tracks if
255 specified.
256
257 .. image:: $PATH_TO_IMAGES/ideas.png
258
259 IDEAS predicts regulatory functions, denoted by epigenetic states, at each position in each cell type by
260 **combining information simultaneously learned from other cell types** at the same positions in cell types with
261 similar local epigenetic landscapes. Size of genomic intervals for determining the similarity are also learned.
262 All of the inferences are done through parallel infinite-state hidden Markov models (iHMM), which is a Bayesian
263 non-parametric technique to automatically determine the number of local cell type clusters and the number of
264 epigenetic states.
265
266 In addition to its improved power, IDEAS has two unique advantages:
267
268 1) **linear time inference** with respect to the number of cell types, which allows it to study hundreds or more cell types jointly
269 2) use mini-batch training to **improve reproducibility** of the predicted epigenetic states, which is important because genome segmentation is not convex and hence cannot guarantee a global optimal solution.
246 270
247 ----- 271 -----
248 272
249 **Options** 273 **Options**
250 274