Abstract

The identification of drug-target interaction (DTI) plays a key role in drug discovery and development. Benefitting from large-scale drug databases and verified DTI relationships, a lot of machine-learning methods have been developed to predict DTIs. However, due to the difficulty in extracting useful information from molecules, the performance of these methods is limited by the representation of drugs and target proteins. This study proposes a new model called EmbedDTI to enhance the representation of both drugs and target proteins, and improve the performance of DTI prediction. For protein sequences, we leverage language modeling for pretraining the feature embeddings of amino acids and feed them to a convolutional neural network model for further representation learning. For drugs, we build two levels of graphs to represent compound structural information, namely the atom graph and substructure graph, and adopt graph convolutional network with an attention module to learn the embedding vectors for the graphs. We compare EmbedDTI with the existing DTI predictors on two benchmark datasets. The experimental results show that EmbedDTI outperforms the state-of-the-art models, and the attention module can identify the components crucial for DTIs in compounds.

Highlights

  • The detection of drug-target interactions (DTIs) is a key step in drug development and drug repositioning

  • We assess the performance of EmbedDTI on two benchmark sets, the Kinase dataset

  • The benefit of two-level graphs is not as obvious as on Davis, while concordance index (CI) is increased by 0.013 in EmbedDTI compared with GraphDTA

Read more

Summary

Introduction

The detection of drug-target interactions (DTIs) is a key step in drug development and drug repositioning. High-throughput screening (HTS) experiments have greatly accelerated the identification of DTIs. HTS experiments are costly and laborious, which cannot meet the need for revealing DTIs for millions of existing compounds and thousands of targets [1,2]. There is a strong motivation to establish computational tools for predict DTIs automatically [3]. The rapid increase of DTI data in public databases, such as ChEMBL [4] , DrugBank [5], and SuperTarget [6], has enabled large-scale in silico identification of DTIs. The computational methods mainly fall into three categories, namely docking-based, similarity search-based and feature-based. Available online: http://www.rdkit.org/ (accessed on 16 October 2021)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call