Abstract

To avoid the “over-fitting” problem in protein function prediction based on protein-protein interactions (PPI), we propose a pattern recognition strategy that all the features of PPI observation data are divided into three sets, training set, learning set and testing set. The employed classifiers are trained on training sets, the receiver operating characteristic (ROC) curve and optimal operating point (OOP) is calculated on learning set, and the accuracy rate is reported on the testing set with OOP. Under this framework, we compare the performances of logistic regression (LR) model with kernel logistic regression (KLR) model on two different feature selection sets, 1-order feature and 2-order feature according to PPI data. The experiment results on a standard PPI data show that KLR model performs better than LR model on training sets of both 1-order feature set and 2-order feature set, and the 2-order feature outperforms 1-order feature set with KLR model on training set . The predictive rates on testing set of both 1-order feature and 2-order feature with LR and KLR can achieve 95%.Keywordsprotein-protein interactionlogistic regressionkernel logistic regressionreceiver operating characteristicoptimal operating point

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call