Abstract
Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al. , is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table’s missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR’s predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson’s $R^{2}$ compared to that of the pQSAR model.
Highlights
The first step in the rational drug design is to discover the hit compounds which can possibly activate or inhibit the enzyme such as a protein kinase
In repeatedly updating the profile data, we use the row vectors as representation vectors of the compounds which are applied to the random forest regressions (RFRs) and partial least squares regression (PLSR)
We compare the performance with RFRs applied for the initialization as given in (2) as well as the Profile QSAR 2.0 (pQSAR) model suggested by Martin et al [27]–[29] as a baseline model because the motive that led us to build the proposed model was inspired by the pSQAR model and began with an attempt to improve the performance of this model
Summary
The first step in the rational drug design is to discover the hit compounds which can possibly activate or inhibit the enzyme such as a protein kinase. Experimental in vivo or in vitro test results have been introduced as descriptors to fill missing biological values [24]–[26] Among these attempts, Martin et al [27]–[29] introduced a distinguishable predictive ensemble method using bioactivity assay data with missing values, named profile QSAR 2.0 (pQSAR). We propose an extended ensemble model of pQSAR using RFRs and PLSRs to improve the performance of a kinase-compound bio-activity prediction method. 94.5% of the values (1,983,209 out of 159 assays × 13,192 compounds) are missing It appears that the proposed method enhances the predictive performances for pQSAR and RFR with the performance improvement for almost all essays (154 except 5).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.