Abstract

Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al. , is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table’s missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR’s predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson’s $R^{2}$ compared to that of the pQSAR model.

Highlights

  • The first step in the rational drug design is to discover the hit compounds which can possibly activate or inhibit the enzyme such as a protein kinase

  • In repeatedly updating the profile data, we use the row vectors as representation vectors of the compounds which are applied to the random forest regressions (RFRs) and partial least squares regression (PLSR)

  • We compare the performance with RFRs applied for the initialization as given in (2) as well as the Profile QSAR 2.0 (pQSAR) model suggested by Martin et al [27]–[29] as a baseline model because the motive that led us to build the proposed model was inspired by the pSQAR model and began with an attempt to improve the performance of this model

Read more

Summary

INTRODUCTION

The first step in the rational drug design is to discover the hit compounds which can possibly activate or inhibit the enzyme such as a protein kinase. Experimental in vivo or in vitro test results have been introduced as descriptors to fill missing biological values [24]–[26] Among these attempts, Martin et al [27]–[29] introduced a distinguishable predictive ensemble method using bioactivity assay data with missing values, named profile QSAR 2.0 (pQSAR). We propose an extended ensemble model of pQSAR using RFRs and PLSRs to improve the performance of a kinase-compound bio-activity prediction method. 94.5% of the values (1,983,209 out of 159 assays × 13,192 compounds) are missing It appears that the proposed method enhances the predictive performances for pQSAR and RFR with the performance improvement for almost all essays (154 except 5).

ITERATION METHOD OF IMPROVING A MODEL PERFORMANCE
EVALUATION OF GOODNESS OF A PREDICTION MODEL BASED ON THE PEARSON’s R2
PREDICTION METHOD BY MUTUAL COMPLEMENT OF
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call