DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion.

Lu Zhang,Guangzhong Liu,Ziwei Xu,Min Liu,Xinyi Qin

doi:10.3390/genes12030354

Abstract

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58–83.38% and an area under the curve (AUC) of 81.39–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95–83.04% and an AUC of 80.79–91.09%, which shows an excellent generalization ability of our proposed method.

Highlights

The post-transcriptional modification of RNA increases the complexity of biological information and the fineness of regulation
The fe3a.t1u.rPeaerxatmraetcetrioSnelmecetitohnoodfsFneaeteudretoEsxetrleaccttiothne optimal parameters, which have a vital effect on thTehceonfesatrtuucrteioenxtorfacptrieodnicmtieotnhomdosdneelse.dIntothseislescttutdhye, tohpetipmaraalmpaertearms eλtearnsd, wwhich hav of pseudo dinucleotide composition (PseDNC),vaitnadl eKfmfeacxtoofnKtShNePcFosncsatrnugcetinoenraotfepinreflduiecnticoentomtohdeeples.rfIonrmthaisncsetuodfyc,latshseifipcaartaiomneters λ a models
Considering the nucleotide sequence length in the datasets is 41, we search for the best values of the two parameters in the range of w ∈ [0.1, 0.9] and λ ∈ [10, 30] with steps of 0.2 and 10, respectively

Summary

Introduction

The post-transcriptional modification of RNA increases the complexity of biological information and the fineness of regulation. The two most representative types of Methylation modifications are N6-methyladenosine (m6A) [2] and 5-methylcytosine (m5C) [3,4,5,6]. Compared with m5C, m6A is the most abundant internal modification on mRNA in eukaryotes, accounting for about 80% of all the methylation forms. M6A refers to the methylation modification that occurs on the sixth nitrogen atom of adenosine under the action of the methyltransferase complexes (i.e., METTL3, METTLI4, WTAP, etc.). As an important RNA post-transcriptional modification site, m6A exists in a variety of species including viruses, bacteria, plants, and mammals [10]. Studies have shown that m6A plays a regulatory role in almost every stage of mRNA metabolism [11]

Methods

Results

Conclusion