Mercurial > repos > greg > ideas
comparison ideas.xml @ 94:7d9af0d824ad draft
Uploaded
author | greg |
---|---|
date | Tue, 05 Sep 2017 08:38:49 -0400 |
parents | 0c2cf49dfb58 |
children | ff4d84a01fa7 |
comparison
equal
deleted
inserted
replaced
93:0c2cf49dfb58 | 94:7d9af0d824ad |
---|---|
235 </test> | 235 </test> |
236 </tests> | 236 </tests> |
237 <help> | 237 <help> |
238 **What it does** | 238 **What it does** |
239 | 239 |
240 Employs the IDEAS (Integrative and Discriminative Epigenome Annotation System) method for jointly and quantitatively characterizing | 240 IDEAS (an **I**ntegrative and **D**iscriminative **E**pigenome **A**nnotation **S**ystem) identifies de novo |
241 multivariate epigenetic landscapes in many cell types, tissues or conditions. The method accounts for position dependent epigenetic | 241 regulatory functions from epigenetic data in multiple cell types jointly. It is a full probabilistic model |
242 events and detects local cell type relationships, which not only help to improve the accuracy of annotating functional classes of DNA | 242 defined on all data, and it combines signals across both the genome and cell types to boost power. The underlying |
243 sequences, but also reveal cell type constitutive and specific loci. The method utilizes Bayesian non-parametric techniques to automatically | 243 assumption of IDEAS is that, because all cell types share the same underlying DNA sequences, **functions of each |
244 identify the best model size fitting to the data so users do not have to specify the number of states. On the other hand, users can | 244 DNA segment should be correlated**. Also, cell type specific regulation is locus-dependent, and thus IDEAS uses |
245 still specify the number of states if desired. | 245 local epigenetic landscape to **identify de novo and local cell type clusters** without assuming or requiring a |
246 known global cell type relationship. | |
247 | |
248 IDEAS takes as input a list of epigenetic data sets (histones, chromatin accessibility, CpG methylation, TFs, etc) | |
249 or any other whole-genome data sets (e.g., scores). Currently the supported data formats include BigWig and BAM. | |
250 All data sets will first be mapped by IDEAS to a common genomic coordinate in a selected assembly (200bp windows | |
251 by default, or user-provided). The user can specify regions to be considered or removed from the analysis. The | |
252 input data may come from one cell type/condition/individual/time point (although it does not fully utilize the | |
253 advantage of IDEAS), or from multiple cell types/conditions/individuals/time points. The same set of epigenetic | |
254 features may not be present in all cell types, for which IDEAS will do imputation of the missing tracks if | |
255 specified. | |
256 | |
257 .. image:: $PATH_TO_IMAGES/ideas.png | |
258 | |
259 IDEAS predicts regulatory functions, denoted by epigenetic states, at each position in each cell type by | |
260 **combining information simultaneously learned from other cell types** at the same positions in cell types with | |
261 similar local epigenetic landscapes. Size of genomic intervals for determining the similarity are also learned. | |
262 All of the inferences are done through parallel infinite-state hidden Markov models (iHMM), which is a Bayesian | |
263 non-parametric technique to automatically determine the number of local cell type clusters and the number of | |
264 epigenetic states. | |
265 | |
266 In addition to its improved power, IDEAS has two unique advantages: | |
267 | |
268 1) **linear time inference** with respect to the number of cell types, which allows it to study hundreds or more cell types jointly | |
269 2) use mini-batch training to **improve reproducibility** of the predicted epigenetic states, which is important because genome segmentation is not convex and hence cannot guarantee a global optimal solution. | |
246 | 270 |
247 ----- | 271 ----- |
248 | 272 |
249 **Options** | 273 **Options** |
250 | 274 |