Abstract

Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.

Highlights

  • Interest of chemogenomics in drug discovery The current paradigm in rationalised drug design usually associates a disease to one or several protein targets, called therapeutic targets, belonging to biological pathways that are involved in the disease development

  • Reference shallow methods As reference methods not involving deep learning, we considered the kronSVM [5] and Neighbourhood Regularised Logistic Matrix Factorisation (NRLMF) [23] approaches introduced in "Reference shallow methods in machine-learning forchemogenomics"section, because they led to state-of-the-art performances in various studies and datasets

  • Comparison of the proposed chemogenomic neural network to reference methods We first compared the performance of the chemogenomic neural network (CN) to those of state-of-theart reference methods, namely kronSVM and NRLMF for machine-learning shallow methods, and forward neural network (FNN) with expert-based descriptors as inputs for deep-learning methods, as described in "Materials and methods"

Read more

Summary

Introduction

Interest of chemogenomics in drug discovery The current paradigm in rationalised drug design usually associates a disease to one or several protein targets, called therapeutic targets, belonging to biological pathways that are involved in the disease development. Most of the hits identified by highthroughput screening (HTS) fail to become approved drugs because of unwanted side effects and toxicity. This in part results from unexpected interactions with so called “off-target” proteins due to drugs lack of specificity. The word “proteochemometrics” is often preferred in papers that study the impact of protein and molecule descriptors on prediction performances [7, 8]. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics, pp.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call