Abstract

Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.

Highlights

  • The combinatorial activities of regulatory elements along the genome defines cellular phenotypes during development and disease

  • Histone modifications and other gene regulatory signals can be profiled across the genome in a given cell type, and each type of regulatory signal correlates with the presence of specific gene regulatory activities

  • Genome segmentation methods look for patterns across combinations of regulatory signals to annotate more general “regulatory states”

Read more

Summary

Introduction

The combinatorial activities of regulatory elements along the genome defines cellular phenotypes during development and disease. Thanks to the proliferation of genomic assays based on massively parallel sequencing technologies, we can comprehensively characterize genomic regulatory components by using thousands of regulatory genomic datasets generated in hundreds of cell types in human and mouse genomes. A pair of major challenges focus on identifying interpretable regulatory events across the genome and characterizing how those regulatory events vary across cell types to affect expression and phenotype. The inferred chromatin states are lowdimensional de-noised representations of raw regulatory signals that produce interpretable catalogs of regulatory events in the genome. Chromatin states are valuable for studying gene regulation and disease, and hypotheses about regulatory relationships based on these states have been confirmed by functional experiments [4]. Chromatin states have been increasingly adopted as a powerful resource for prioritizing and interpreting disease non-coding variants [5,6,7,8,9]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call