Abstract
BackgroundChromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study genome-wide binding sites of transcription factors. There is an increasing interest in understanding the mechanism of action of co-regulator proteins, which do not bind DNA directly, but exert their effects by binding to transcription factors such as the estrogen receptor (ER). However, due to the nature of detecting indirect protein-DNA interaction, ChIP-seq signals from co-regulators can be relatively weak and thus biologically meaningful interactions remain difficult to identify.ResultsIn this study, we investigated and compared different statistical and machine learning approaches including unsupervised, supervised, and semi-supervised classification (self-training) approaches to integrate multiple types of genomic and transcriptomic information derived from our experiments and public database to overcome difficulty of identifying functional DNA binding sites of the co-regulator SRC-1 in the context of estrogen response. Our results indicate that supervised learning with naïve Bayes algorithm significantly enhances peak calling of weak ChIP-seq signals and outperforms other machine learning algorithms. Our integrative approach revealed many potential ERα/SRC-1 DNA binding sites that would otherwise be missed by conventional peak calling algorithms with default settings.ConclusionsOur results indicate that a supervised classification approach enables one to utilize limited amounts of prior knowledge together with multiple types of biological data to enhance the sensitivity and specificity of the identification of DNA binding sites from co-regulator proteins.
Highlights
Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study genome-wide binding sites of transcription factors
The biological study underlying this paper aims to investigate the impact of estrogen receptor a (ERa)/SRC-1 interaction on estrogen induced gene expression in a bone cell line transfected with ERa (U2OS-ERa), which may shed light on the effect of estrogen-related bone development, bone loss, and potentially bone metastasis
Integrating multiple sources of biological information for identifying SRC-1 binding sites To corroborate the results of SRC-1 ChIP-seq, we studied the ERa ChIP-seq data and investigated peaks overlapping between the ERa and the SRC-1 ChIP-seq results
Summary
Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study genome-wide binding sites of transcription factors. ChIP-seq involves the short-read (~30 bp) sequencing of the ChIP-enriched DNA fragments These short sequence reads (tags) are aligned to a reference genome. Coregulator ChIP-seq measures the secondary protein-DNA binding through primary TFs and leads to relatively weak sequencing signals–i.e. relatively small number of sequence tags above noise. As such, it remains a challenge for contemporary peak calling methods to detect weak secondary protein-DNA-binding signals and simultaneously maintain a high specificity
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have