Prediction of Hepatitis C Virus Non-Structural Proteins 5B Polymerase Inhibitors Using Machine Learning Methods

Lü Wei,Xue Ying

doi:10.3866/pku.whxb20110608

Abstract

Non-structural proteins 5B (NS5B) play an important role in protein maturation and gene replication as an RNA dependent RNA polymerase in the hepatitis C virus (HCV). Inhibiting NS5B polymerase will prevent RNA replication and, therefore, it is significant for the treatment of HCV. It is becoming increasingly important to screen and predict molecules that have NS5B inhibitory activity by computational methods. This work explores several machine learning (ML) methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for the prediction of NS5B inhibitors (NS5BIs). This prediction system was tested using 1248 compounds (552 NS5BIs and 696 non- NS5BIs), which are significantly more diverse in chemical structure than those used in other studies. A feature selection method was used to improve the prediction accuracy and the selection of molecular descriptors responsible for distinguishing between NS5BIs and non-NS5BIs. The prediction accuracies were 81.4%-91.7% for the NS5BIs, 78.2%-87.2% for the non-NS5BIs, and 84.1%-85.0% overall based on the three kinds of machine learning methods. SVM gave the best accuracy of 91.7% for the NS5BIs, C4.5 gave the best accuracy of 87.2% for the non-NS5BIs, and k-NN gave the best overall accuracy of 85.0% for all the compounds. This work suggests that machine learning methods can facilitate the prediction of the NS5BIs potential for unknown sets of compounds and to determine the molecular descriptors associated with NS5BIs.

Full Text