Efficient multi-task chemogenomics for drug specificity prediction.

Benoit Playe,Chloé-Agathe Azencott,Véronique Stoven,Alexandre G De Brevern

doi:10.1371/journal.pone.0204999

Benoit Playe, Chloé-Agathe Azencott + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0204999

Copy DOI

Abstract

Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.

Highlights

IntroductionThe current paradigm in rationalized drug design is to identify a small molecular compound that binds to a protein involved in the development of a disease in order to alter disease progression
We evaluate our multi-task Support Vector Machine (SVM) based methods and state-of-the-art approaches in several key scenario that explore the impact of the similarity between the query pair and the training data on the Efficient multi-task chemogenomics for drug specificity prediction prediction performance, a point that is rarely discussed in the literature
We considered three families of proteins because they gather a wide range of therapeutic targets, and have been considered in other chemogenomics studies, providing reference prediction scores: G-Protein Coupled Receptors (GPCRs), ion channels (IC), and kinases

Summary

Introduction

The current paradigm in rationalized drug design is to identify a small molecular compound that binds to a protein involved in the development of a disease in order to alter disease progression. Once a hit ligand has been identified, often by combining in silico and in vitro approaches, this molecule needs to be optimized in order to meet the ADME (Absorption, Distribution, Metabolism, Elimination), toxicity, and industrial synthesis requirements. Pre-clinical and clinical assays are organized to obtain agreement from the regulatory agencies. When successful, this process often lasts more than ten years, and recent estimates set the cost of drug development in US$2.5 billion in 2013 [1].

Objectives

Methods

Results

Discussion

Conclusion