Abstract

BackgroundGenome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges.ResultsUsing a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations.ConclusionsThe full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.

Highlights

  • Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the noncoding genome, including identifying regulatory elements

  • We note that not all chromatin marks were profiled in all cell or tissue types, but the stacked modelling can still be applied directly

  • We focused our analysis on a model with 100 states (“Methods”)

Read more

Summary

Introduction

Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the noncoding genome, including identifying regulatory elements Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. Large collections of per-celltype chromatin state annotations have been generated using either (1) independent models that learn a different set of states in each cell type or (2) a single model that is learned across all cell types, resulting in a common set of states across cell types, yet generating per-cell-type annotations (in some cases per-tissue-type annotations are generated, but we will use the terms cell-type and tissue interchangeably for ease of presentation). These models that produce per-cell-type annotations tend to be most appropriate in studies where researchers are interested in studying individual cell types

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call