Abstract

The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between the chemical structure and biological activities in the field of drug design and discovery. (1) Background: In the study of QSAR, the chemical structures of compounds are encoded by a substantial number of descriptors. Some redundant, noisy and irrelevant descriptors result in a side-effect for the QSAR model. Meanwhile, too many descriptors can result in overfitting or low correlation between chemical structure and biological bioactivity. (2) Methods: We use novel log-sum regularization to select quite a few descriptors that are relevant to biological activities. In addition, a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. (3) Results: Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. (4) Conclusions: Our proposed multiple linear regression with log-sum penalty is an effective technique for both descriptor selection and prediction of biological activity.

Highlights

  • The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between chemical the structure and biological activities in the field of drug design and discovery [1]

  • In the field of drug design and discovery, only a few descriptors are of interest to the QSAR model

  • Descriptor selection plays an important role in the study of QSAR

Read more

Summary

Introduction

The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between chemical the structure and biological activities in the field of drug design and discovery [1]. Descriptor selection techniques can be categorized into four groups in the study of QSAR: classical methods, artificial intelligence-based methods, miscellaneous methods and regularization methods. The classical methods have been proposed in the study of QSAR; as an example, forward selection adds the most significant descriptors until none improves the model to a statistically-significant extent. A coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. The structure of this paper is organized as follows: Section 2 introduces a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients and gives a detailed description of the datasets.

Methods
Coordinate Decent Algorithm for Different Thresholding Operators
Simulated Data
Real Data
Analyses of Simulated Data
Analyses of Real Data
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.