Abstract

Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.

Highlights

  • In the past years, deep learning (DL) methods have been successfully applied to a variety of research topics in biomedicine and drug discovery [1,2,3]

  • We identify Catboost Gradient Boosting on Decision Trees (GBDT) as an optimal regression model for the prediction of drug combination sensitivity and synergy after testing 13 algorithms on the 10% of the DrugComb dataset in three replicates (Table 2)

  • We systematically compared 11 variants of such molecular representations in predicting drug combination sensitivity and synergy scores, and evaluated their relationships based on the clustering performance and centered kernel alignment (CKA)-based fingerprint similarity

Read more

Summary

Introduction

Deep learning (DL) methods have been successfully applied to a variety of research topics in biomedicine and drug discovery [1,2,3]. Deep neural networks achieve stateof-the-art performance in medical computer vision tasks and protein structural modeling, enabling de novo generation of drug candidates and development of prognostic clinical models [4,5,6,7,8]. Such performance of DL models is context-dependent [9,10,11,12]. Daylight Theory: SMARTS – A Language for Describing Molecular Patterns.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call