TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites

Shengli Zhang,Yujie Xu,Yunyun Liang

doi:10.1016/j.csbj.2023.11.052

Abstract

RNA N7-methylguanosine (m7G) is a crucial chemical modification of RNA molecules, whose principal duty is to maintain RNA function and protein translation. Studying and predicting RNA N7-methylguanosine sites aid in comprehending the biological function of RNA and the development of new drug therapy regimens. In the present scenario, the efficacy of techniques, specifically deep learning and machine learning, stands out in the prediction of RNA N7-methylguanosine sites, leading to improved accuracy and identification efficiency. In this study, we propose a model leveraging the transformer framework that integrates natural language processing and deep learning to predict m7G sites, called TMSC-m7G. In TMSC-m7G, a combination of multi-sense-scaled token embedding and fixed-position embedding is used to replace traditional word embedding for the extraction of contextual information from sequences. Moreover, a convolutional layer is added in the encoder to make up for the shortage of local information acquisition in transformer. The model's robustness and generalization are validated through 10-fold cross-validation and an independent dataset test. Results demonstrate outstanding performance in comparison to the most advanced models available. Among them, the Accuracy of TMSC-m7G reaches 98.70% and 92.92% on the benchmark dataset and independent dataset, respectively. To facilitate the popularization and use of the model, we have developed an intuitive online prediction tool, which is easily accessible for free at http://39.105.212.81/.

Full Text