Abstract

Identifying drug–target interactions (DTIs) is important for drug discovery. However, searching all drug–target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model’s performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein–ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence–ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub (https://github.com/GIST-CSBL/HoTS).

Highlights

  • Identifying drug–target interactions (DTIs) is a crucial step in drug discovery

  • As stated above, the average precision (AP) dropped significantly at the first DTI training epoch, AP values for additional DTI training epochs converged following the trend of those for the Binding region (BR) prediction epochs

  • Given the observed convergence in model performance, we interpret that the BR and DTI prediction models shared common features

Read more

Summary

Introduction

Identifying drug–target interactions (DTIs) is a crucial step in drug discovery. As it is not feasible to test all chemical compounds against a given target protein, in silico prediction of possible active compounds using massive chemical libraries can increase the efficiency of drug discovery [1]. Thanks to the vast amount of information on drug compounds and their targets [2], as well as advances in computing power, researchers have been able to develop DTI prediction models using the proteochemometric (PCM) approach [3]. As protein feature engineering for DTI prediction, identification of binding pockets/sites is important for prediction performance and comprehensive modeling [13,14,15]. Many computational models have been developed to identify binding pockets/sites.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.