Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features

Dongsheng Cao,Yizeng Liang,Hongdong Li,Yifeng Yun,Qingsong Xu

doi:10.1007/s10822-010-9401-1

Abstract

Building a robust and reliable QSAR/QSPR model should greatly consider two aspects: selecting the optimal variable subset from a large pool of molecular descriptors and detecting outliers from a pool of samples. The two problems have the specific similarity and complementarity to some extent. Given a particular learning algorithm on a particular data set, one should consider how the interaction could happen between variable selection and outlier detection. In this paper, we describe a consistent methodology for simultaneously performing variable subset selection and outlier detection using the idea of statistical distribution which can be simulated by the establishment of many cross-predictive linear models. The approach exploits the fact that the distribution of linear model coefficients provides a mechanism for ranking and interpreting the effects of variable, while the distribution of prediction errors provides a mechanism for differentiating the outliers from normal samples. The use of statistic of these distributions, namely mean value and standard deviation, inherently provides a feasible way to effectively describe the information contained by the original samples. Several examples are used to demonstrate the prediction ability of our proposed approach through the comparison of different approaches as well as their combinations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features

Abstract

Talk to us

Similar Papers

More From: Journal of Computer-Aided Molecular Design

Lead the way for us

Journal: Journal of Computer-Aided Molecular Design	Publication Date: Nov 13, 2010
Citations: 35

Similar Papers

Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling
Dong‐Sheng Cao ... Rui‐Gang Zhao
Journal of Chemometrics | VOL. 31
Dong‐Sheng Cao, et. al.Dong‐Sheng Cao ... Rui‐Gang Zhao
18 Jul 2017
Journal of Chemometrics | VOL. 31

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection
Jong Suk Park ... Chun Gun Park
Communications for Statistical Applications and Methods | VOL. 26
Jong Suk Park, et. al.Jong Suk Park ... Chun Gun Park
31 Mar 2019
Communications for Statistical Applications and Methods | VOL. 26

Use of Orthogonal Factors for Selection of Variables in a Regression Equation-An Illustration
Janet R Daling ... H Tamura
Applied Statistics | VOL. 19
Janet R Daling, et. al.Janet R Daling ... H Tamura
01 Jan 1970
Applied Statistics | VOL. 19

Genetic algorithms for outlier detection and variable selection in linear regression models
J Tolvi
Soft Computing | VOL. 8
J TolviJ Tolvi
07 Oct 2003
Soft Computing | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features

Abstract

Talk to us

Similar Papers

More From: Journal of Computer-Aided Molecular Design