An industrial evaluation of proteochemometric modelling: Predicting drug-target affinities for kinases

Astrid Stroobants,Lewis H Mervin,Ola Engkvist,Graeme R Robb

doi:10.1016/j.ailsci.2023.100079

Astrid Stroobants, Lewis H Mervin + Show 2 more

Open Access

https://doi.org/10.1016/j.ailsci.2023.100079

Copy DOI

Abstract

Deep learning proteochemometric (PCM) models have been reported to achieve excellent performances on public benchmarking datasets. Nevertheless, numerous papers have cast doubt on commonly used evaluation metrics, suggesting they do not reflect true prospective predictive abilities. The aim of this study is to provide a comprehensive assessment of performance of a state-of-the-art PCM model on proprietary data and evaluate its potential over other modelling approaches as a virtual screening tool for kinase inhibitors. Whilst the model has been shown to achieve an RMSE of 0.48 on a public benchmarking dataset, an impaired overall performance was observed for the proprietary dataset in this study, with an RMSE of 0.85 and a Pearson Correlation Coefficient of 0.65 using a temporal splitting strategy. We hypothesise that the more limited performance can be in part attributed to a shift in the chemical space observed over time in an industrial setting, which is not considered by the more lenient random ligand splitting strategy, more commonly used on benchmarking datasets. The overall performance of the PCM model was statistically similar to a multitask model and only slightly superior to a KNN and random forest PCM model. A comprehensive analysis of performance was performed to capture the key challenges faced in the design of competitive kinase inhibitors, which revealed the key limitations of PCM modelling. For example, the model showed poor predictive abilities for understudied targets, and a limited ability to assess ligand selectivity and promiscuity, with no improved performance over a multitask model or a random forest PCM model. Overall, these findings reveal that the PCM model assessed in this study does not provide significant benefits over less complex models such as multitask model or a random forest PCM model as a virtual screening tool for kinase inhibitors in an industrial setting. Taken together, this study highlights the need for more robust evaluations of PCM models by using stricter splitting strategies, more extensive benchmarking and more comprehensive performance analysis beyond traditional metrics.

Full Text