Improved global protein homolog detection with major gains in function identification.

Mesih Kilinc,Kejue Jia,Robert L Jernigan

doi:10.1073/pnas.2211823120

Mesih Kilinc, Kejue Jia + Show 1 more

Open Access

https://doi.org/10.1073/pnas.2211823120

Copy DOI

Abstract

There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size. This PRotein Ortholog Search Tool (PROST) is significantly faster with linear runtimes, and most importantly, computes the distances between pairs of protein sequences to yield homologs at significantly lower levels of sequence identity than previously. The extent of allosteric effects in proteins points out the importance of global aspects of structure and sequence. PROST excels at global homology detection but not at detecting local homologs. Results are validated by strong similarities between the corresponding pairs of structures. The number of remote homologs detected increased significantly and pushes the effective sequence matches more deeply into the twilight zone. Human protein sequences presently having no assigned function now find significant numbers of putative homologs for 93% of cases and structurally verified assigned functions for 76.4% of these cases. The data compression enables massive searches for homologs with short search times while yielding significant gains in the numbers of remote homologs detected. The method is sufficiently efficient to permit whole-genome/proteome comparisons. The PROST web server is accessible at https://mesihk.github.io/prost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: Feb 24, 2023
Citations: 25	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Improved global protein homolog detection with major gains in function identification.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Fold homology detection using sequence fragment composition profiles of proteins
Armando D Solis ... Shalom R Rackovsky
Proteins: Structure, Function, and Bioinformatics | VOL. 78
Armando D Solis, et. al.Armando D Solis ... Shalom R Rackovsky
16 Aug 2010
Proteins: Structure, Function, and Bioinformatics | VOL. 78

Reducing Dimensionality in Remote Homology Detection
S Dinesh
International Journal for Research in Applied Science and Engineering Technology | VOL. 9
S DineshS Dinesh
31 Dec 2021
International Journal for Research in Applied Science and Engineering Technology | VOL. 9

Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison.
Maricel G Kann ... Richard A Goldstein
Proteins | VOL. 48
Maricel G Kann, et. al.Maricel G Kann ... Richard A Goldstein
04 Jun 2002
Proteins | VOL. 48

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields
Jianzhu Ma ... Jinbo Xu
PLoS Computational Biology | VOL. 10
Jianzhu Ma, et. al.Jianzhu Ma ... Jinbo Xu
27 Mar 2014
PLoS Computational Biology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved global protein homolog detection with major gains in function identification.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences