In the present study both classification and correlation techniques of diverse nature were successfully employed for the development of models for the prediction of human immunodeficiency virus (HIV) integrase inhibitory activity using a dataset comprising 50 analogs of quinolone carboxylic acid. The values of various molecular descriptors (MDs) for each analog in the dataset were computed using the MDS V-life science QSAR plus module. The values of other MDs which are not part of MDS V-life science were computed using an in-house computer program. A decision tree (DT) was constructed for the HIV integrase inhibitory activity to determine the importance of MDs. The DT learned the information from the input data with an accuracy of 98% and correctly predicted the cross-validated (10 fold) data with an accuracy of 96%. Three MDs, E-state contribution descriptor (SssOHE), molecular connectivity topochemical index ($\chi {}^{{\rm A}} $), and eccentric connectivity topochemical index ($\xi _{{\rm C}}^{{\rm C}} $), were used to develop the models using moving average analysis (MAA). The accuracy of classification of single descriptor based models using MAA was found to vary from a minimum of 96% to a maximum of 98%. The statistical significance of the models was assessed through specificity, sensitivity, overall accuracy, Mathew's correlation coefficient, and intercorrelation analysis. The widely used methods like multiple linear regression, partial least squares, and principal component regression were employed for development of correlation models. The models were generated on a training set of 36 molecules. The models had a correlation coefficient (r(2) ) of 0.86 to 0.92, significant cross validated correlation coefficient (q(2) ) of 0.79 to 0.85, F-test from 63.2 to 93.06, r(2) for external test set (pred_r(2) ) from 0.69, coefficient of correlation of predicted dataset (pred_ r(2) Se) of 0.77, and degree of freedom from 27 to 30. Alignment independent descriptors, SsOHE-index, SaaCHE index, SssCH2, and x log P were found to be the most important descriptors for the development of correlation models for the prediction of HIV integrase inhibitory activity.
Read full abstract