Abstract

BackgroundEnhancers are non-coding regions of the genome that control the activity of target genes. Recent efforts to identify active enhancers experimentally and in silico have proven effective. While these tools can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning the activity of enhancers are often unclear.ResultsUsing machine learning (ML) and a rule-based explainable artificial intelligence (XAI) model, we demonstrate that we can predict the location of known enhancers in Drosophila with a high degree of accuracy. Most importantly, we use the rules of the XAI model to provide insight into the underlying combinatorial histone modifications code of enhancers. In addition, we identified a large set of putative enhancers that display the same epigenetic signature as enhancers identified experimentally. These putative enhancers are enriched in nascent transcription, divergent transcription and have 3D contacts with promoters of transcribed genes. However, they display only intermediary enrichment of mediator and cohesin complexes compared to previously characterised active enhancers. We also found that 10–15% of the predicted enhancers display similar characteristics to super enhancers observed in other species.ConclusionsHere, we applied an explainable AI model to predict enhancers with high accuracy. Most importantly, we identified that different combinations of epigenetic marks characterise different groups of enhancers. Finally, we discovered a large set of putative enhancers which display similar characteristics with previously characterised active enhancers.

Highlights

  • Enhancers are non-coding regions of the genome that control the activity of target genes

  • To investigate how well the machine learning (ML) and explainable AI models generalise, we trained the models on data from BG3 cells and predicted enhancers in S2 cells using the corresponding histone modifications ChIP datasets (Fig. 1B)

  • We found that 9% of enhancers display enrichment of H4K16ac that, in Drosophila, it has been mainly associated with dosage compensation [44, 45]

Read more

Summary

Introduction

Enhancers are non-coding regions of the genome that control the activity of target genes. Regulation of gene expression in eukaryotic cells is a complex process governed by interactions between DNA binding proteins (transcription factors), and the regulatory elements in DNA to which they bind Mutations in these non-coding regulatory elements can cause disease states by affecting the spatial and temporal control of gene expression [1,2,3,4]. In addition to not having a specific location in the genome, there is no general sequence code for enhancers and a given enhancer may only be active only in specific spatial, temporal, or environmental conditions [10] All of these features complicate the discovery and annotation of enhancers both experimentally and computationally

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call