Mercurial > repos > greg > ideas
changeset 94:7d9af0d824ad draft
Uploaded
| author | greg | 
|---|---|
| date | Tue, 05 Sep 2017 08:38:49 -0400 | 
| parents | 0c2cf49dfb58 | 
| children | ff4d84a01fa7 | 
| files | ideas.xml static/images/ideas.png | 
| diffstat | 2 files changed, 30 insertions(+), 6 deletions(-) [+] | 
line wrap: on
 line diff
--- a/ideas.xml Tue Aug 29 13:05:25 2017 -0400 +++ b/ideas.xml Tue Sep 05 08:38:49 2017 -0400 @@ -237,12 +237,36 @@ <help> **What it does** -Employs the IDEAS (Integrative and Discriminative Epigenome Annotation System) method for jointly and quantitatively characterizing -multivariate epigenetic landscapes in many cell types, tissues or conditions. The method accounts for position dependent epigenetic -events and detects local cell type relationships, which not only help to improve the accuracy of annotating functional classes of DNA -sequences, but also reveal cell type constitutive and specific loci. The method utilizes Bayesian non-parametric techniques to automatically -identify the best model size fitting to the data so users do not have to specify the number of states. On the other hand, users can -still specify the number of states if desired. +IDEAS (an **I**ntegrative and **D**iscriminative **E**pigenome **A**nnotation **S**ystem) identifies de novo +regulatory functions from epigenetic data in multiple cell types jointly. It is a full probabilistic model +defined on all data, and it combines signals across both the genome and cell types to boost power. The underlying +assumption of IDEAS is that, because all cell types share the same underlying DNA sequences, **functions of each +DNA segment should be correlated**. Also, cell type specific regulation is locus-dependent, and thus IDEAS uses +local epigenetic landscape to **identify de novo and local cell type clusters** without assuming or requiring a +known global cell type relationship. + +IDEAS takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc) +or any other whole-genome data sets (e.g., scores). Currently the supported data formats include BigWig and BAM. +All data sets will first be mapped by IDEAS to a common genomic coordinate in a selected assembly (200bp windows +by default, or user-provided). The user can specify regions to be considered or removed from the analysis. The +input data may come from one cell type/condition/individual/time point (although it does not fully utilize the +advantage of IDEAS), or from multiple cell types/conditions/individuals/time points. The same set of epigenetic +features may not be present in all cell types, for which IDEAS will do imputation of the missing tracks if +specified. + +.. image:: $PATH_TO_IMAGES/ideas.png + +IDEAS predicts regulatory functions, denoted by epigenetic states, at each position in each cell type by +**combining information simultaneously learned from other cell types** at the same positions in cell types with +similar local epigenetic landscapes. Size of genomic intervals for determining the similarity are also learned. +All of the inferences are done through parallel infinite-state hidden Markov models (iHMM), which is a Bayesian +non-parametric technique to automatically determine the number of local cell type clusters and the number of +epigenetic states. + +In addition to its improved power, IDEAS has two unique advantages: + + 1) **linear time inference** with respect to the number of cell types, which allows it to study hundreds or more cell types jointly + 2) use mini-batch training to **improve reproducibility** of the predicted epigenetic states, which is important because genome segmentation is not convex and hence cannot guarantee a global optimal solution. -----
