Abstract

Protein–protein interaction site (PPIS) prediction must deal with the diversity of interaction sites that limits their prediction accuracy. Use of proteins with unknown or unidentified interactions can also lead to missing interfaces. Such data errors are often brought into the training dataset. In response to these two problems, we used the minimum covariance determinant (MCD) method to refine the training data to build a predictor with better performance, utilizing its ability of removing outliers. In order to predict test data in practice, a method based on Mahalanobis distance was devised to select proper test data as input for the predictor. With leave-one-validation and independent test, after the Mahalanobis distance screening, our method achieved higher performance according to Matthews correlation coefficient (MCC), although only a part of test data could be predicted. These results indicate that data refinement is an efficient approach to improve protein–protein interaction site prediction. By further optimizing our method, it is hopeful to develop predictors of better performance and wide range of application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call