Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.

Qianmu Yuan,Yu Wang,Yuedong Yang,Huiying Zhao,Sheng Chen

doi:10.1093/bib/bbac444

Abstract

More than one-third of the proteins contain metal ions in the Protein Data Bank. Correct identification of metal ion-binding residues is important for understanding protein functions and designing novel drugs. Due to the small size and high versatility of metal ions, it remains challenging to computationally predict their binding sites from protein sequence. Existing sequence-based methods are of low accuracy due to the lack of structural information, and time-consuming owing to the usage of multi-sequence alignment. Here, we propose LMetalSite, an alignment-free sequence-based predictor for binding sites of the four most frequently seen metal ions in BioLiP (Zn2+, Ca2+, Mg2+ and Mn2+). LMetalSite leverages the pretrained language model to rapidly generate informative sequence representations and employs transformer to capture long-range dependencies. Multi-task learning is adopted to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions. LMetalSite was shown to surpass state-of-the-art structure-based methods by more than 19.7, 14.4, 36.8 and 12.6% in area under the precision recall on the four independent tests, respectively. Further analyses indicated that the self-attention modules are effective to learn the structural contexts of residues from protein sequence. We provide the data sets, source codes and trained models of LMetalSite at https://github.com/biomed-AI/LMetalSite.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Briefings in Bioinformatics	Publication Date: Oct 23, 2022
Citations: 26	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics

Lead the way for us

Similar Papers

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.
Yitian Fang ... Qin Ma
Bioinformatics | VOL. 39
Yitian Fang, et. al.Yitian Fang ... Qin Ma
28 Nov 2023
Bioinformatics | VOL. 39

Domain-based small molecule binding site annotation
Kevin A Snyder ... Michel Dumontier
BMC Bioinformatics | VOL. 7
Kevin A Snyder, et. al.Kevin A Snyder ... Michel Dumontier
17 Mar 2006
BMC Bioinformatics | VOL. 7

Structural and functional characterization of a unique hypothetical protein (WP_003901628.1) of Mycobacterium tuberculosis: a computational approach
Reaz Uddin ... Sidra Rafi
Medicinal Chemistry Research | VOL. 26
Reaz Uddin, et. al.Reaz Uddin ... Sidra Rafi
03 Mar 2017
Medicinal Chemistry Research | VOL. 26

Metal Ion Substrate Inhibition of Ferrochelatase
Gregory A Hunter ... Gloria C Ferreira
Journal of Biological Chemistry | VOL. 283
Gregory A Hunter, et. al.Gregory A Hunter ... Gloria C Ferreira
01 Aug 2008
Journal of Biological Chemistry | VOL. 283

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics