Abstract

Protein methylation is one of the most prominent posttranslation modifications that essentially regulates several biological processes in eukaryotes. Therefore, identification of the arginine methylation site is crucial in deciphering its characteristics and functions in cell biology, disease mechanisms, and guided drug development. The computation methods address the long-term bottleneck together with the cost, time, and labor required in experimental methods for large-scale identification of protein arginine methylation sites. In this study, we proposed a robust machine learning-based computational tool known as iIRMethyl, employing the primary sequence and physicochemical properties of protein along with a two-step feature selection method for optimal selection of feature descriptors. Moreover, the performance of iIRMethyl was comprehensively evaluated via k-fold cross-validation on a benchmark dataset and independent test dataset. iIRMethyl demonstrated a remarkably greater performance than the state-of-the-art method and achieved an average area under the curve value of 0.99 for both k-fold cross-validation and an independent test set in the identification of protein arginine methylation sites. Furthermore, the outcomes reveal that iIRMethyl is a robust and accurate computational tool for large-scale identification of arginine methylation sites and would facilitate the understanding of their functional mechanisms and accelerating their application in drug development and clinical therapy. Additionally, the prediction mechanism of the proposed model iIRMethyl is interpreted using the SHapley Additive exPlanation algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call