Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.
Highlights
Epigenetics refers to the reversible and heritable changes in gene function when there is no change in the nuclear DNA sequence [1]
Even though the above methods have improved the performance for identifying 6mA sites, too few data sets have been adopted to fully reflect the whole genome and to build robust models
We have proposed a novel predictor called 6mAPred-MSFF to predict the DNA 6mA sites. 6mAPred-MSFF is the first deep learning predictor, in which we integrate the global and local context by the inverted residual block and multi-scale channel
Summary
Epigenetics refers to the reversible and heritable changes in gene function when there is no change in the nuclear DNA sequence [1]. DNA methylation modifications play important roles in epigenetic regulation of gene expression without altering the sequence, and it is widely distributed in the genome of different species [2]. It can be divided into three categories according to the position of methylation modification: N6-methyladenine (6mA), 5-Methylcytosine (5mC), and N4-methylcytosine (4mC) [3,4]. Zhang et al reveal that 6mA is a conserved DNA modification that is positively associated with gene expression and contributes to key agronomic traits in plants [11]. Some studies have found that N6-methyladenine DNA modification is widely distributed in the Human Genome and plays important biological functions
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have