Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data.

Pavel Sidorov,Pedro J Ballester,Jérémy Ariey-Bonnet,Eddy Pasquier,Stefan Naulaerts

doi:10.3389/fchem.2019.00509

Abstract

Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5,000 pair-wise drug combinations, 60 cell lines, 4 types of models, and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least 2-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic.

Highlights

Drug combinations are a well-established form of cancer treatment (Bayat Mokhtari et al, 2017)
We perform an exploratory modeling on the FG datasets in order to determine optimal settings for synergy prediction by assessing various types of features, data augmentation schemes and machine learning methods
The best median Rp across cell lines for Random Forest (RF) was obtained with 250 trees, a third of the features evaluated at each tree node, training data augmentation and Morgan FingerPrint Counts (MFPC) fingerprints complemented by physico-chemical properties (256 and 7 features per drug, respectively)

Summary

Introduction

Drug combinations are a well-established form of cancer treatment (Bayat Mokhtari et al, 2017). Quantitative Structure-Activity Relationship (QSAR) models establish a mathematical relationship between the chemical structure of a molecule, encoded as a set of structural and/or physico-chemical features (descriptors), and its biological activity on a target. Such methods have been successfully used in a wide variety of pharmacology and drug design projects (Cherkasov et al, 2014), including cancer research (Chen et al, 2007; Mullen et al, 2011; Ali and Aittokallio, 2018). QSAR modeling has achieved accurate prediction of compound activity on non-molecular targets such as cancer cell lines (Kumar et al, 2014)

Methods

Results

Discussion

Conclusion