Abstract
Super-enhancers (SEs) are clusters of transcriptional enhancers which control the expression of cell identity and disease-associated genes. Current studies demonstrated the role of multiple factors in SE formation; however, a systematic analysis to assess the relative predictive importance of chromatin and sequence features of SEs and their constituents is lacking. In addition, a predictive model that integrates various types of data to predict SEs has not been established. Here, we integrated diverse types of genomic and epigenomic datasets to identify key signatures of SEs and investigated their predictive importance. Through integrative modeling, we found Cdk8, Cdk9, and Smad3 as new features of SEs, which can define known and new SEs in mouse embryonic stem cells and pro-B cells. We compared six state-of-the-art machine learning models to predict SEs and showed that non-parametric ensemble models performed better as compared to parametric. We validated these models using cross-validation and also independent datasets in four human cell-types. Taken together, our systematic analysis and ranking of features can be used as a platform to define and understand the biology of SEs in other cell-types.
Highlights
Enhancers are cis-regulatory regions in the DNA that augment the transcription of associated genes and play a key role in cell-type specific gene expression[1,2]
Through the ranking of chromatin regulators and transcription factors (TFs) we found that Cdk[8], Cdk[9], and Smad[3] were important features along with many well-known chromatin signatures of SEs, including H3K27ac, Brd[4], Med[12], and p30017–20,29
We presented a systematic approach to rank and access the importance of different features of SEs
Summary
Enhancers are cis-regulatory regions in the DNA that augment the transcription of associated genes and play a key role in cell-type specific gene expression[1,2]. Many factors have been associated with enhancer activity, including mono methylation of histone H3 at lysine 4 (H3K4me1), acetylation of histone H3 at lysine 27 (H3K27ac), binding of the coactivator proteins p300 and CBP, and DNase I hypersensitivity sites (DHSs)[8,10,11] By exploiting these factors and other genomic features many supervised and unsupervised machine learning approaches have been developed to predict enhancers genome-wide[4,12,13]. To identify key features of SEs and to investigate their relative contribution in the prediction of SEs, we integrated diverse types of publicly available datasets, including ChIP-seq data for histone modifications, chromatin regulators and TFs, DHSs, and genomic data. Our feature ranking and analysis can serve as a resource to further characterize and understand SEs in other cell-types
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.