A multiple classifier system identifies novel cannabinoid CB2 receptor ligands

David Ruano-Ordás,Iryna Yevseyeva,Michael T M Emmerich,Jose R Mendez,Lindsey Burggraaff,Rongfang Liu,Gerard J P Van Westen,Cas Van Der Horst,Laura H Heitman

doi:10.1186/s13321-019-0389-9

Abstract

Drugs have become an essential part of our lives due to their ability to improve people’s health and quality of life. However, for many diseases, approved drugs are not yet available or existing drugs have undesirable side effects, making the pharmaceutical industry strive to discover new drugs and active compounds. The development of drugs is an expensive process, which typically starts with the detection of candidate molecules (screening) after a protein target has been identified. To this end, the use of high-performance screening techniques has become a critical issue in order to palliate the high costs. Therefore, the popularity of computer-based screening (often called virtual screening or in silico screening) has rapidly increased during the last decade. A wide variety of Machine Learning (ML) techniques has been used in conjunction with chemical structure and physicochemical properties for screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently (iii) Multiple Classifier Systems (MCS). Here, we apply an MCS for virtual screening (D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine (1,834,362 compounds), was virtually screened to identify 48,232 potential active molecules using D2-MCS. Identified molecules were ranked to select 21 promising novel compounds for in vitro evaluation. Experimental validation confirmed six highly active hits (> 50% displacement at 10 µM and subsequent Ki determination) and an additional five medium active hits (> 25% displacement at 10 µM). Hence, D2-MCS provided a hit rate of 29% for highly active compounds and an overall hit rate of 52%.

Highlights

In silico drug discovery relies on different computer-based techniques to find a novel or improved bio-active compound, which should exhibit a strong affinity to a particular target
Lenselink et al [15] demonstrate the suitability of using of Deep Neural Networks (DNN) [16] and Random Forests (RF) [17] methods against single Machine Learning (ML) models to predict the bioactivity of molecules
Compounds identified as active by D2-Multiple Classifier Systems (MCS) classifier were clustered based on the same binary features (FCFP_6) that were used for model training using the cluster molecules component in Pipeline Pilot version 2016 [23]

Summary

Introduction

In silico (or computational) drug discovery relies on different computer-based techniques to find a novel or improved bio-active compound, which should exhibit a strong affinity to a particular target. The usage of above-mentioned ensembling methods contributed to significant performance improvements in the virtual screening domain Their introduction brought about some important shortcomings such as: (i) the random selection of the information often used to build each inner classifier, (ii) the common usage of weak classifiers such as C4.5 or Decision Stumps to build up the classifier ensemble ( any ML classifier can be used) and, (iii) the impossibility combining different inner classifiers and configurations for them with concrete subsets of training information. Keeping into account the above-mentioned issues we apply an MCS toolkit (called D2-MCS [21]) to increase the performance of virtual screening

Methods

Results

Conclusions