Introduction of rm2(rank) metric incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models

Kunal Roy,Indrani Mitra,Probir Kumar Ojha,Supratik Kar,Rudra Narayan Das,Humayun Kabir

doi:10.1016/j.chemolab.2012.06.004

Abstract

In silico techniques involving the development of quantitative regression models have been extensively used for prediction of activity, property and toxicity of new chemicals. The acceptability and subsequent applicability of the models for predictions is determined based on several internal and external validation statistics. Among different validation metrics, Q2 and R2pred represent the classical metrics for internal validation and external validation respectively. Additionally, the rm2 metrics introduced by Roy and coworkers have been widely used by several groups of authors to ensure the close agreement of the predicted response data with the observed ones. However, none of the currently available and commonly used validation metrics provides any information regarding the rank-order predictions for the test set. Thus, to incorporate the concept of ranking order predictions while calculating the common validation metrics originally using the Pearson's correlation coefficient-based algorithm, the new rm2(rank) metric has been introduced in this work as a new variant of the rm2 series of metrics. The ability of this new metric to perform the rank-order prediction is determined based on its application in judging the quality of predictions of regression — based quantitative structure–activity/property relationship (QSAR/QSPR) models for four different data sets. The different validation metrics calculated in each case were compared for their ability to reflect the rank-order predictions based on their correlation with the conventional Spearman's rank correlation coefficient. Based on the results of the sum of ranking differences analysis performed using the Spearman's rank correlation coefficient as the reference, it was observed that the rm2(rank) metric exhibited the least difference in ranking from that of the reference metric. Thus, the close correlation of the rm2(rank) metric with the Spearman's rank correlation coefficient inferred that the new metric could aptly perform the rank-order prediction for the test data set and can be utilized as an additional validation tool, besides the conventional metrics, for assessing the acceptability and predictive ability of a QSAR/QSPR model.

Full Text