Abstract

MotivationDrug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data.ResultsWe propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines.Availability and implementationThe model scripts are available at https://github.com/kexinhuang12345/moltrans.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Drug discovery is notoriously costly and time-consuming due to the need of experimental search over large drug compound space

  • drugtarget protein interaction (DTI) serves as the foundation for finding new drugs and new indications of existing drugs, since the therapeutic effects of drug compounds are detected by examining DTIs [1]

  • We show empirically that MolTrans has robust improved predictive performance over state-of-the-art baselines

Read more

Summary

Introduction

Drug discovery is notoriously costly and time-consuming due to the need of experimental search over large drug compound space. They often take drug and protein data as inputs, cast DTI as a classification problem, and make prediction by feeding the inputs through deep learning models such as deep neural network (DNN) [3], deep belief network (DBN) [4], and convolutional neural network (CNN) [5, 6, 7] Despite these efforts, the following challenges are still open. The model architectures in previous works are not designed to enable the integration of massive dataset To solve these challenges, we propose a transformer [12]-based bio-inspired molecular data representation method (coined as MolTrans) to leverage vast unlabelled data for in silico DTI prediction. We show empirically that MolTrans has robust improved predictive performance over state-of-the-art baselines

Related Works
Method
Literature
Augmented Transformer Embedding Module
Q2: MolTrans has competitive performance in unseen drug and target setting
Q3: MolTrans performs best with scarce data
Q4: MolTrans allows model understanding
Small: we use smaller dataset to train FCS
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call