Protein methylation is a vital regulator of many biological processes at the post-translational level, and accurate prediction of protein methylation sites is essential for research and drug discovery. In this paper, we present a new method, namely RMSxAI, to predict the arginine methylation sites from primary sequences using machine learning algorithms and describe the predictions using explainable artificial intelligence (XAI) techniques. Leveraging experimentally validated methylated and unmethylated protein sequences from diverse organisms, we deduced several sequence features, encompassing physicochemical properties, amino acid composition, and evolutionary insights. Our results show that the proposed RMSxAI can predict protein methylation sites with high accuracy, bringing the F1 score up to 0.88 and overall accuracy up to 88.4%. We use various XAI methods to explain the output results. These include key features, partial occupancy maps, and local variation models that provide insight into key features and interactions that lead to predictions. Overall, our approach is relevant to research and drug discovery, and our results demonstrate the potential of machine learning algorithms and XAI methods to provide accurate and meaningful prediction of arginine methylation sites.
Read full abstract