Abstract

BackgroundRecently, machine learning-based ligand activity prediction methods have been greatly improved. However, if known active compounds of a target protein are unavailable, the machine learning-based method cannot be applied. In such cases, docking simulation is generally applied because it only requires a tertiary structure of the target protein. However, the conformation search and the evaluation of binding energy of docking simulation are computationally heavy and thus docking simulation needs huge computational resources. Thus, if we can apply a machine learning-based activity prediction method for a novel target protein, such methods would be highly useful. Recently, Tsubaki et al. proposed an end-to-end learning method to predict the activity of compounds for novel target proteins. However, the prediction accuracy of the method was still insufficient because it only used amino acid sequence information of a protein as the input.ResultsIn this research, we proposed an end-to-end learning-based compound activity prediction using structure information of a binding pocket of a target protein. The proposed method learns the important features by end-to-end learning using a graph neural network both for a compound structure and a protein binding pocket structure. As a result of the evaluation experiments, the proposed method has shown higher accuracy than an existing method using amino acid sequence information.ConclusionsThe proposed method achieved equivalent accuracy to docking simulation using AutoDock Vina with much shorter computing time. This indicated that a machine learning-based approach would be promising even for novel target proteins in activity prediction.

Highlights

  • Machine learning-based ligand activity prediction methods have been greatly improved

  • DUD-E is a dataset constructed for the performance evaluation of the structure-based screening method created by Mysinger et al A total of 102 target proteins were selected considering diversity, and active compounds and decoy compounds were prepared for each target

  • We checked that proteins in the training dataset had no sequence similarity to those in the test dataset using NCBI-BLAST [11] as described below

Read more

Summary

Introduction

Machine learning-based ligand activity prediction methods have been greatly improved. If known active compounds of a target protein are unavailable, the machine learning-based method cannot be applied. The prediction accuracy of the method was still insufficient because it only used amino acid sequence information of a protein as the input These days, A typical drug discovery process takes 12–14 years and costs about $ 2.6 billion [1, 2]. At the Tanebe and Ishida BMC Bioinformatics (2021) 22:529 initial stage of drug discovery, a compound screening is often performed for selecting drug candidate compounds from a chemical compound library Such a library sometimes contains millions to tens of millions of compounds and the cost of the screening cannot be ignored. The accuracy has been improved recently and the technology has been applied successfully [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call