Abstract

The application of different alternative approaches for building linear regression equations in tasks which are connected with description of physicochemical parameters of molecules has been described. The Ordinary Least Squares, the Least Absolute Deviation, and the Orthogonal Distances methods are among the chosen approaches. In tasks, connected with multicollinearity of predictor sets, the principle component regression and L2-regularization have been applied. The special attention has been given to those approaches that made possible to reduce the number of predictors (the L1-regularization, the Least Angles methods). In case of data with noticeable errors in both dependent and independent variables, the orthogonal distance method has been examined as an alternative to the least square approach. The adequacy of previously investigated least absolute deviation of orthogonal distances (LADOD) method has been demonstrated.

Highlights

  • More than two hundred years ago the ordinary lest squares (OLS) method, which is cornerstone of contemporary experimental investigations, has been developed in works of Gauss and Legendre

  • The regression analysis plays a significant role in the construction of QSAR (Quantitative structure-activity relationship) equations

  • To validate results obtained in calculation without descriptor MLER_L we selected test sample which consisted of 10 molecules, other 33 molecules were used as a training set to build models in principal component regression (PCR), OLS, and least absolute deviation (LAD) methods

Read more

Summary

Introduction

More than two hundred years ago the ordinary lest squares (OLS) method, which is cornerstone of contemporary experimental investigations, has been developed in works of Gauss and Legendre (in the present article we treated the OLS as a simplest approach for building regression equation). The “strength” of the regularizing factor in (8) is determined by the parameter 0 In this method, the problem of explicit (or not explicit) inversion of the matrix ( X X ) (4) is solved, even in the case when it is ill-conditioned or even degenerate. Function (10) is similar to (8), here the regularization factor is an absolute value of regression parameters β, 1 sign( ) Such a regularization guarantees the shrinkage of descriptor set, when λ > 0. EN, both (8) and (10) regularization factors have to be included to the minimization function [16] This variant of regression is characterized by numerical stability in the initial stages of calculation, when the set of descriptors is still large and can be multicollinear. For the detailed discussion of predictive ability of QSAR models see refs. [24,25]

Numerical Results
Method
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.