Abstract
BackgroundLarger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets.ResultsHere we bring to genomics a scenario of functional principal component analysis, a finite Karhunen-Loève transform, and explicitly decompose the variation in the coverage profiles of 27 chromatin mark ChIP-seq datasets at transcription start sites for H1, one of the most used human embryonic stem cell lines. Using this approach we identify positive correlations between H3K4me3 and H3K36me3, as well as between H3K9ac and H3K36me3, so far undetected by the most commonly used Pearson correlation between read enrichment coverages. We uncover highly negative correlations between H2A.Z, H3K4me3, and several histone acetylation marks, but these occur only between principal components of first and second order. We also demonstrate that levels of gene expression correlate significantly with scores of components of order higher than one, demonstrating that transcriptional regulation by histone marks escapes simple one-to-one relationships. This correlations were higher in significance and magnitude in protein coding genes than in non-coding RNAs.ConclusionsIn summary, we present a methodology to explore and uncover novel patterns of epigenomic variability and covariability in genomic data sets by using a functional eigenvalue decomposition of genomic data. R code is available at: http://github.com/pmb59/KLTepigenome.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-015-0051-7) contains supplementary material, which is available to authorized users.
Highlights
Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types
We demonstrate the applicability of functional principal component analysis (FPCA) for studying quantitative relationships between principal components of histone modifications and gene expression levels, which is otherwise limited due to the unidimensional analysis of read-enrichment commonly used for ChIP-seq data
The results strongly suggest that higher order principal components are chromatin mark features that have a direct impact on gene expression, and that correlate with the components of other chromatin marks, shedding light into the complexity of the cross-talk between these histone-code key players
Summary
Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets. Madrigal and Krajewski BioData Mining (2015) 8:20 gene expression [2] Understanding these dynamic processes has become much easier with recent advances in high-throughput sequencing [3]. One of the key problems in computational epigenomics is the poor understanding of associations between epigenetic signals. A set of six key histone modifications (H3K4me, H3K4me, H3K9ac, H3K9me, H3K27me, H3K36me3) was initially defined for human epigenome analysis by the Roadmap Epigenomics Consortium to be most informative, and having relatively well-known function and antibodies of reasonable quality [3]. Novel analytical methods will facilitate the transition from the analysis of few to multiple epigenetic marks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.