QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets.

Jiaxiao Chen,Luhua Lai,Minghua Deng,Jianfeng Pei,Zhonghui Gu,Youjun Xu

doi:10.1002/pro.4555

Abstract

The development of efficient computational methods for drug target protein identification can compensate for the high cost of experiments and is therefore of great significance for drug development. However, existing structure-based drug target protein-identification algorithms are limited by the insufficient number of proteins with experimentally resolved structures. Moreover, sequence-based algorithms cannot effectively extract information from protein sequences and thus display insufficient accuracy. Here, we combined the sequence-based self-supervised pretraining protein language model ESM1b with a graph convolutional neural network classifier to develop an improved, sequence-based drug target protein identification method. This complete model, named QuoteTarget, efficiently encodes proteins based on sequence information alone and achieves an accuracy of 95% with the nonredundant drug target and nondrug target datasets constructed for this study. When applied to all proteins from Homo sapiens, QuoteTarget identified 1213 potential undeveloped drug target proteins. We further inferred residue-binding weights from the well-trained network using the gradient-weighted class activation mapping (Grad-Cam) algorithm. Notably, we found that without any binding site information input, significant residues inferred by the model closely match the experimentally confirmed drug molecule-binding sites. Thus, our work provides a highly effective sequence-based identifier for drug target proteins, as well to yield new insights into recognizing drug molecule-binding sites. The entire model is available at https://github.com/Chenjxjx/drug-target-prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets.

Abstract

Talk to us

Similar Papers

More From: Protein Science

Lead the way for us

Journal: Protein Science	Publication Date: Jan 26, 2023
Citations: 11

Similar Papers

Large-scale identification of potential drug targets based on the topological features of human protein–protein interaction network
Zhan-Chao Li ... Xiao-Yong Zou
Analytica Chimica Acta | VOL. 871
Zhan-Chao Li, et. al.Zhan-Chao Li ... Xiao-Yong Zou
12 Feb 2015
Analytica Chimica Acta | VOL. 871

Efficient Data Mining Algorithms for Screening Potential Proteins of Drug Target
Qi Wang ... Jincai Huang
Mathematical Problems in Engineering | VOL. 2017
Qi Wang, et. al.Qi Wang ... Jincai Huang
01 Jan 2017
Mathematical Problems in Engineering | VOL. 2017

Properties of protein drug target classes.
Simon C Bull ... Andrew J Doig
PloS one | VOL. 10
Simon C Bull, et. al.Simon C Bull ... Andrew J Doig
30 Mar 2015
PloS one | VOL. 10

Retrieval of Enterobacteriaceae drug targets using singular value decomposition
Rita Silvério-Machado ... Marcos A Dos Santos
Bioinformatics | VOL. 31
Rita Silvério-Machado, et. al.Rita Silvério-Machado ... Marcos A Dos Santos
04 Dec 2014
Bioinformatics | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets.

Abstract

Talk to us

Similar Papers

More From: Protein Science