Abstract
N4-acetylcytidine (ac4C) is a post-transcriptional modification in mRNA which plays a major role in the stability and regulation of mRNA translation. The working mechanism of ac4C modification in mRNA is still unclear and traditional laboratory experiments are time-consuming and expensive. Therefore, we propose an XG-ac4C machine learning model based on the eXtreme Gradient Boost classifier for the identification of ac4C sites. The XG-ac4C model uses a combination of electron-ion interaction pseudopotentials and electron-ion interaction pseudopotentials of trinucleotide of the nucleotides in ac4C sites. Moreover, Shapley additive explanations and local interpretable model-agnostic explanations are applied to understand the importance of features and their contribution to the final prediction outcome. The obtained results demonstrate that XG-ac4C outperforms existing state-of-the-art methods. In more detail, the proposed model improves the area under the precision-recall curve by 9.4% and 9.6% in cross-validation and independent tests, respectively. Finally, a user-friendly web server based on the proposed model for ac4C site identification is made freely available at http://nsclbio.jbnu.ac.kr/tools/xgac4c/.
Highlights
IntroductionIt occurs on cytidine and it is the only acetylation modification in eukaryotic m RNA2
More than 160 different RNA modifications have been identified[1]
We adopt the combination of electron-ion interaction pseudopotentials (EIIP) and PseEIIP to encode mRNA sequences for ac4C site identification
Summary
It occurs on cytidine and it is the only acetylation modification in eukaryotic m RNA2. The role of ac4C in the regulation of mRNA translation and promotion of translation efficiency was established by Arango et al.[3] An analysis of the half-life of mRNA showed that the acetylation level and stability of target mRNA are positively correlated. NAT10 mutation decreases detection of ac4C at the mapped mRNA site and is associated with down-regulation of target mRNA. The acetylated residues expand the repertoire of mRNA modifications to establish the role of ac4C in the regulation of mRNA translation. The PACES predictor was proposed for classification of the ac4C modification sites in human mRNA6. In this study, we propose a computational model based on the eXtreme Gradient Boosting (XGboost) method to identify ac4C modification sites in mRNA. We built a userfriendly web server for the proposed model, which is freely accessible at http://nsclbio.jbnu.ac.kr/tools/xgac[4]c/
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have