Abstract

Abstract High-throughput technologies such as chromatin immunoprecipitation (IP) followed by next generation sequencing (ChIP-seq) in combination with gene expression studies have enabled researchers to investigate relationships between the distribution of chromosome-associated proteins and the regulation of gene transcription on a genome-wide scale. Several attempts at integrative analyses have identified direct relationships between the two processes. However, a comprehensive understanding of the regulatory events remains elusive. This is in part due to the scarcity of robust analytical methods for the detection of binding regions from ChIP-seq data. In this paper, we have applied a recently proposed Markov random field model for the detection of enriched binding regions under different biological conditions and time points. The method accounts for spatial dependencies and IP efficiencies, which can vary significantly between different experiments. We further defined the enriched chromosomal binding regions as distinct genomic features, such as promoter, exon, intron, and distal intergenic, and then investigated how predictive each of these features are of gene expression activity using machine learning techniques, including neural networks, decision trees and random forest. The analysis of a ChIP-seq time-series dataset comprising six protein markers and associated microarray data, obtained from the same biological samples, shows promising results and identified biologically plausible relationships between the protein profiles and gene regulation.

Highlights

  • Chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-seq) is a method used to identify the binding sites of chromosome-associated/‘epigenetic’ proteins (Note that the term epigenetic will be used in its broadest sense throughout this manuscript.)

  • All data values were collected from murine bone-marrow derived macrophages (BMDMs), stimulated with lipopolysaccharide (LPS), and from LPS stimulated BMDMs treated with a synthetic compound (I-BET)

  • The epigenetic data was generated from a ChIP-seq time-series dataset that included quantification of bromodomain-containing protein 4 (Brd4); acetylated histone H4 (H4ac); histone H3 lysine 4 tri-methylation (H3K4me3); RNA polymerase II (RNA PolII); subunit of RNA polymerase II (RNA PolII S2); and cyclin-dependent kinase 9 (CDK9)

Read more

Summary

Introduction

Chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-seq) is a method used to identify the binding sites of chromosome-associated/‘epigenetic’ proteins (Note that the term epigenetic will be used in its broadest sense throughout this manuscript.). ChIP-seq in combination with gene expression data enables researchers to investigate relationships between chromosomal-bound protein regulatory mechanisms and gene expression responses on a genome-wide scale. There are many studies where ChIP-seq data is in the public domain but the corresponding gene expression data is not available: and again, it is not possible to understand how epigenetic modifications dictate gene expression responses [8]. We propose that machine learning data models could be used to address such situations, by modelling the mechanistic relationships between observed gene expression responses and the corresponding epigenetic modifications. Once the association between gene expression and epigenetic regulatory events is defined, it should be possible to predict one from the other and extrapolate this information into a deeper understanding of gene regulation mechanisms

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.