Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

Faizan Sahigara,Roberto Todeschini,Davide Ballabio,Viviana Consonni

doi:10.1186/1758-2946-5-27

Faizan Sahigara, Roberto Todeschini + Show 2 more

Open Access

https://doi.org/10.1186/1758-2946-5-27

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: May 30, 2013
Citations: 98	License type: CC BY 2.0

Affiliation: University of Milano-Bicocca

Abstract

BackgroundWith the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model’s AD has yet been recognized.ResultsThis study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account.ConclusionsThe proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model’s AD for reliable predictions.

Highlights

With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly
A novel k-Nearest Neighbours (kNN)-based approach to define the AD of QSAR models was proposed
The overall execution of this approach was performed in three different phases that efficiently used the salient features of kNN principle to define a model’s AD in its descriptor space

Summary

Introduction

With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. The popularity of QSARs has seen a growth from time to time and was complemented by the availability of more sophisticated and efficient model development techniques. This fact was further supported by the consideration of QSAR predictions for regulatory purposes. Existing literature has often emphasized upon validating the QSAR models to reflect their robustness and predictive ability. In 2004, following five OECD principles for model validation were adopted to validate a QSAR model for its regulatory consideration: a) a defined endpoint; b) an unambiguous algorithm; c) a defined domain of applicability d) appropriate measures for goodness-offit, robustness and predictivity and e) mechanistic interpretation, if possible [3]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Uncertainty quantification: Can we trust artificial intelligence in drug discovery?
Jie Yu ... Mingyue Zheng
iScience | VOL. 25
Jie Yu, et. al.Jie Yu ... Mingyue Zheng
21 Jul 2022
iScience | VOL. 25

Development of new methods for the (Q)SAR applicability domain assessment : using structural information in a statistical study of the errors in prediction

-

01 Jan 2015
01 Jan 2015

Sustainable Business Models: A Review
Saeed Nosratabadi ... Edmundas Kazimieras Zavadskas
Sustainability | VOL. 11
Saeed Nosratabadi, et. al.Saeed Nosratabadi ... Edmundas Kazimieras Zavadskas
19 Mar 2019
Sustainability | VOL. 11

Sustainable Business Models: A Review
Saeed Nosratabadi ... Andry Rakotonirainy
SSRN Electronic Journal | VOL. -
Saeed Nosratabadi, et. al.Saeed Nosratabadi ... Andry Rakotonirainy
01 Jan 2020
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics