Abstract
BackgroundRecently, machine learning-based ligand activity prediction methods have been greatly improved. However, if known active compounds of a target protein are unavailable, the machine learning-based method cannot be applied. In such cases, docking simulation is generally applied because it only requires a tertiary structure of the target protein. However, the conformation search and the evaluation of binding energy of docking simulation are computationally heavy and thus docking simulation needs huge computational resources. Thus, if we can apply a machine learning-based activity prediction method for a novel target protein, such methods would be highly useful. Recently, Tsubaki et al. proposed an end-to-end learning method to predict the activity of compounds for novel target proteins. However, the prediction accuracy of the method was still insufficient because it only used amino acid sequence information of a protein as the input.ResultsIn this research, we proposed an end-to-end learning-based compound activity prediction using structure information of a binding pocket of a target protein. The proposed method learns the important features by end-to-end learning using a graph neural network both for a compound structure and a protein binding pocket structure. As a result of the evaluation experiments, the proposed method has shown higher accuracy than an existing method using amino acid sequence information.ConclusionsThe proposed method achieved equivalent accuracy to docking simulation using AutoDock Vina with much shorter computing time. This indicated that a machine learning-based approach would be promising even for novel target proteins in activity prediction.
Highlights
Machine learning-based ligand activity prediction methods have been greatly improved
DUD-E is a dataset constructed for the performance evaluation of the structure-based screening method created by Mysinger et al A total of 102 target proteins were selected considering diversity, and active compounds and decoy compounds were prepared for each target
We checked that proteins in the training dataset had no sequence similarity to those in the test dataset using NCBI-BLAST [11] as described below
Summary
Machine learning-based ligand activity prediction methods have been greatly improved. If known active compounds of a target protein are unavailable, the machine learning-based method cannot be applied. The prediction accuracy of the method was still insufficient because it only used amino acid sequence information of a protein as the input These days, A typical drug discovery process takes 12–14 years and costs about $ 2.6 billion [1, 2]. At the Tanebe and Ishida BMC Bioinformatics (2021) 22:529 initial stage of drug discovery, a compound screening is often performed for selecting drug candidate compounds from a chemical compound library Such a library sometimes contains millions to tens of millions of compounds and the cost of the screening cannot be ignored. The accuracy has been improved recently and the technology has been applied successfully [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.