Prediction of phosphorylation sites based on granular support vector machine

Gong Cheng,Ruchang Zhang,Qingfeng Chen

doi:10.1007/s41066-019-00202-5

Abstract

Protein phosphorylation is the most extensive and important post-translational modification in eukaryotes, regulating the activity of almost all cells. Experimental methods used to identify phosphorylation sites, such as mass spectrometry, are costly and time-consuming. A number of algorithms have been developed to predict phosphorylation sites. However, they often select small data volume by random sampling. This cannot make full use of the characteristics of the entire data set to build a prediction model. According to the granularity calculation combined with the kernel fuzzy C-means clustering, this paper maps the massive raw data to a high-dimensional kernel space, and then divides the grains by clustering to obtain high-dimensional equilibrium grains. In particular, a specific granular support vector machine (KFCC–GSVM) prediction model is built in equilibrium grain data. This novel model improves the rationality and reliability of phosphorylation site data compression, so that the compressed data has the same distribution in the kernel space as the pre-compression data when applying the traditional SVM algorithm classification. Experimental results demonstrate that our method is better than the SVM-based non-kinase-specific phosphorylation site prediction method—Musite and the traditional GSVM method.

Full Text