Abstract

We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call