Abstract

Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only provide point predictions. We here describe a methodology that uses Conformal Prediction for predicting off-target interactions, with models trained on data from 31 targets in the ExCAPE-DB dataset selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the developed methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.

Highlights

  • Drug-target interactions are central to the drug discovery process (Yildirim et al, 2007), and is the subject of study for the field of chemogenomics (Bredel and Jacoby, 2004), which has emerged and grown over the last few decades

  • Models for all targets in Dataset1 were produced in the form of portable Java Archive (JAR) files, which were built into portable Docker containers, for easy publication as microservices

  • To check that the Conformal Prediction models are valid, calibration plots were generated in the cross validation step of the workflow

Read more

Summary

Introduction

Drug-target interactions are central to the drug discovery process (Yildirim et al, 2007), and is the subject of study for the field of chemogenomics (Bredel and Jacoby, 2004), which has emerged and grown over the last few decades. The recent increase in the number of available SAR data points in interaction databases such as ChEMBL (Gaulton et al, 2017) and PubChem (Wang et al, 2017) makes it feasible to use ligand-based models to predict targets and panels of targets. Chembench is a webbased portal, which, founded in 2008 is one of the first publicly available integrated cheminformatics web portals It integrates a number of commercial as well as open source tools for dataset creation, validation, modeling and validation. TargetHunter (Wang et al, 2013) is another online tool that uses chemical similarity to predict targets for ligands, and show how training models on ChEMBL data can enable useful predictions on examples taken from PubChem bioassays. The polypharmacology browser (Awale and Reymond, 2017) is a webbased target prediction tool that queries ChEMBL bioactivity data using multiple fingerprints

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call