Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings.

Wayland Yeung,Zhongliang Zhou,Sheng Li,Natarajan Kannan

doi:10.1093/bib/bbac599

Wayland Yeung, Zhongliang Zhou + Show 2 more

Open Access

https://doi.org/10.1093/bib/bbac599

Copy DOI

Abstract

Protein language modeling is a fast-emerging deep learning method in bioinformatics with diverse applications such as structure prediction and protein design. However, application toward estimating sequence conservation for functional site prediction has not been systematically explored. Here, we present a method for the alignment-free estimation of sequence conservation using sequence embeddings generated from protein language models. Comprehensive benchmarks across publicly available protein language models reveal that ESM2 models provide the best performance to computational cost ratio for conservation estimation. Applying our method to full-length protein sequences, we demonstrate that embedding-based methods are not sensitive to the order of conserved elements-conservation scores can be calculated for multidomain proteins in a single run, without the need to separate individual domains. Our method can also identify conserved functional sites within fast-evolving sequence regions (such as domain inserts), which we demonstrate through the identification of conserved phosphorylation motifs in variable insert segments in protein kinases. Overall, embedding-based conservation analysis is a broadly applicable method for identifying potential functional sites in any full-length protein sequence and estimating conservation in an alignment-free manner. To run this on your protein sequence of interest, try our scripts at https://github.com/esbgkannan/kibby.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Briefings in bioinformatics	Publication Date: Jan 11, 2023
Citations: 12	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings.

Abstract

Talk to us

Similar Papers

More From: Briefings in bioinformatics

Lead the way for us

Similar Papers

Learning to Read and Write in the Language of Proteins
Helen T Hobbs ... Chang C Liu
GEN Biotechnology | VOL. 2
Helen T Hobbs, et. al.Helen T Hobbs ... Chang C Liu
01 Apr 2023
GEN Biotechnology | VOL. 2

Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model
Hengwei Chen ... Jürgen Bajorath
Journal of Cheminformatics | VOL. 16
Hengwei Chen, et. al.Hengwei Chen ... Jürgen Bajorath
22 May 2024
Journal of Cheminformatics | VOL. 16

Coping with Viral Diversity in HIV Vaccine Design: A Response to Nickle et al.
Will Fischer ... H X Liao
PLoS Computational Biology | VOL. 4
Will Fischer, et. al.Will Fischer ... H X Liao
01 Jan 2008
PLoS Computational Biology | VOL. 4

Centenary Award and Sir Frederick Gowland Hopkins Memorial Lecture. Protein folding, structure prediction and design.
David Baker
Biochemical Society transactions | VOL. 42
David BakerDavid Baker
20 Mar 2014
Biochemical Society transactions | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings.

Abstract

Talk to us

Similar Papers

More From: Briefings in bioinformatics