Abstract

Quantitative structure–activity relationship (QSAR) models have long been used for making predictions and data gap filling in diverse fields including medicinal chemistry, predictive toxicology, environmental fate modeling, materials science, agricultural science, nanoscience, food science, and so forth. Usually a QSAR model is developed based on chemical information of a properly designed training set and corresponding experimental response data while the model is validated using one or more test set(s) for which the experimental response data are available. However, it is interesting to estimate the reliability of predictions when the model is applied to a completely new data set (true external set) even when the new data points are within applicability domain (AD) of the developed model. In the present study, we have categorized the quality of predictions for the test set or true external set into three groups (good, moderate, and bad) based on absolute prediction errors. Then, we have used three criteria [(a) mean absolute error of leave-one-out predictions for 10 most close training compounds for each query molecule; (b) AD in terms of similarity based on the standardization approach; and (c) proximity of the predicted value of the query compound to the mean training response] in different weighting schemes for making a composite score of predictions. It was found that using the most frequently appearing weighting scheme 0.5–0–0.5, the composite score-based categorization showed concordance with absolute prediction error-based categorization for more than 80% test data points while working with 5 different datasets with 15 models for each set derived in three different splitting techniques. These observations were also confirmed with true external sets for another four endpoints suggesting applicability of the scheme to judge the reliability of predictions for new datasets. The scheme has been implemented in a tool “Prediction Reliability Indicator” available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/, and the tool is presently valid for multiple linear regression models only.

Highlights

  • Quantitative structure−activity relationship (QSAR) models are used as popular tools for prediction of response data of chemicals to bridge data gaps.[1]

  • The statistical quality of QSAR models is typically judged by a series of quality metrics while the quality of predictions is examined by methods such as cross-validation, test set validation, Y-randomization, and so forth, and the results are expressed in terms of different validation metrics, for which different threshold values have been reported in the literature.[3,4]

  • The success of any QSAR model lies in precisely predicting a true external test set which has not been used during model development as well as in validation stage

Read more

Summary

Introduction

Quantitative structure−activity relationship (QSAR) models are used as popular tools for prediction of response data of chemicals to bridge data gaps.[1]. A model with respectable values of different correlation coefficients (R2, QLOO2, QExt‐F12, QExt‐F22, QExt‐F32, rm[2], etc.) and/or error measures [mean absolute prediction error (MAE), RMSEP, etc.]5 is not necessarily expected to perform well while predicting the response for a new query chemical. This is because usually QSAR models are developed using rather limited datasets. At the same time, Received: July 13, 2018 Accepted: September 6, 2018 Published: September 19, 2018

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call