Abstract

BackgroundPrecise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput.ResultsWe propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features.ConclusionsEP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

Highlights

  • Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms

  • Recent studies have shown that noncoding single nucleotide polymorphisms (SNPs) that are associated with risk for numerous

  • Since EPIs could only happen between active enhancers and promoters, we used the full set of all enhancers and promoters as external resources to perform unsupervised feature extraction which would be elaborated

Read more

Summary

Introduction

Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. It is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. One of the major discoveries in recent years is that noncoding DNAs are not “junk”. On the contrary, they fulfill a wide variety of crucial biological roles involving regulatory and signaling functions [1]. The identification of true three-dimensional (3D) genome organization, especially EPIs across different cell lines constitutes important steps towards understanding of gene regulation, cell differentiation and disease mechanisms. All these traditional experimental approaches for detecting 3D genome interactions remain time-consuming and noisy, motivating the development of computational approaches

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call