Drug Target Identification with Machine Learning: How to Choose Negative Examples.

Matthieu Najm,Véronique Stoven,Chloé-Agathe Azencott,Benoit Playe

doi:10.3390/ijms22105118

Abstract

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

Highlights

Drug discovery often relies on the identification of a therapeutic target, usually a protein playing a role in a disease
We explore how to best choose negative examples to correct the statistical bias of databases, and reduce the number of false positive predictions, which is essential to reduce the number of biological experiments required for validation of the true protein targets
The goal of the present paper was to tackle the question of protein target identification for new drug candidates, using machine learning (ML)-based chemogenomics

Summary

Introduction

Drug discovery often relies on the identification of a therapeutic target, usually a protein playing a role in a disease. Small molecular drugs that interact with the protein target to alter disease development are designed or searched for among large molecular databases. There has been a renewed interest in recent years for phenotypic drug discovery, which does not rely on prior knowledge of the target. The pharmaceutical industry has invested more efforts in poorly understood rare diseases, and for which therapeutic targets have not been discovered yet. The target points at key biological pathways involved in the disease, helping to better understand its molecular basis

Objectives

Methods

Results

Conclusion