Abstract

Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works focused on identifying enhancers based on their sequence information, however, the resulting performance still requires improvements. Using deep learning methods, this study proposes a model ensemble of classifiers for predicting enhancers based on deep recurrent neural networks. The input features of deep ensemble networks were generated from six types of dinucleotide physicochemical properties, which had outperformed the other features. In summary, our model which used this ensemble approach could identify enhancers with achieved sensitivity of 75.5%, specificity of 76%, accuracy of 75.5%, and MCC of 0.51. For classifying enhancers into strong or weak sequences, our model reached sensitivity of 83.15%, specificity of 45.61%, accuracy of 68.49%, and MCC of 0.312. Compared to the benchmark result, our results had higher performance in term of most measurement metrics. The results showed that deep model ensembles hold the potential for improving on the best results achieved to date using shallow machine learning methods.

Highlights

  • Enhancers are short deoxyribonucleic acid (DNA) fragments of 50–1500 base pairs that assume an important part in the genetic process of gene expression in which a DNA segment is replicated into ribonucleic acid (RNA) [1,2]

  • Xiong et al [6] retrieved the information of 10 tissues and identified the functions of enhancers across them. They found that enhancers associated with transcription factors (TF), enhancer-associated RNAs and target genes appeared in tissue specificity and function across different tissues

  • Our single model outperformed the ensemble model with achieved accuracy of 56.007% and Matthews Correlation Coefficient (MCC) of 0.125

Read more

Summary

Introduction

Enhancers are short deoxyribonucleic acid (DNA) fragments of 50–1500 base pairs (bp) that assume an important part in the genetic process of gene expression in which a DNA segment is replicated into ribonucleic acid (RNA) [1,2]. Due to their possibly distant location of up to 1 Mbp relative to the gene on which they perform a regulatory action, enhancers are difficult to identify using early biochemical experimental methods such as DNA foot-printing [3]. The latest work from Arbel et al [7] identified

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call