Abstract
The identification of drug-target interaction (DTI) plays a key role in drug discovery and development. Benefitting from large-scale drug databases and verified DTI relationships, a lot of machine-learning methods have been developed to predict DTIs. However, due to the difficulty in extracting useful information from molecules, the performance of these methods is limited by the representation of drugs and target proteins. This study proposes a new model called EmbedDTI to enhance the representation of both drugs and target proteins, and improve the performance of DTI prediction. For protein sequences, we leverage language modeling for pretraining the feature embeddings of amino acids and feed them to a convolutional neural network model for further representation learning. For drugs, we build two levels of graphs to represent compound structural information, namely the atom graph and substructure graph, and adopt graph convolutional network with an attention module to learn the embedding vectors for the graphs. We compare EmbedDTI with the existing DTI predictors on two benchmark datasets. The experimental results show that EmbedDTI outperforms the state-of-the-art models, and the attention module can identify the components crucial for DTIs in compounds.
Highlights
The detection of drug-target interactions (DTIs) is a key step in drug development and drug repositioning
We assess the performance of EmbedDTI on two benchmark sets, the Kinase dataset
The benefit of two-level graphs is not as obvious as on Davis, while concordance index (CI) is increased by 0.013 in EmbedDTI compared with GraphDTA
Summary
The detection of drug-target interactions (DTIs) is a key step in drug development and drug repositioning. High-throughput screening (HTS) experiments have greatly accelerated the identification of DTIs. HTS experiments are costly and laborious, which cannot meet the need for revealing DTIs for millions of existing compounds and thousands of targets [1,2]. There is a strong motivation to establish computational tools for predict DTIs automatically [3]. The rapid increase of DTI data in public databases, such as ChEMBL [4] , DrugBank [5], and SuperTarget [6], has enabled large-scale in silico identification of DTIs. The computational methods mainly fall into three categories, namely docking-based, similarity search-based and feature-based. Available online: http://www.rdkit.org/ (accessed on 16 October 2021)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.