Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification

Meredita Susanty,Muhammad Khaerul Naim Mursalim,Rukman Hertadi,Ayu Purwarianti,Tati Le Rajab

doi:10.1016/j.compbiolchem.2024.108163

Abstract

The increasing demand for eco-friendly technologies in biotechnology necessitates effective and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic environments, hold immense promise for various applications, including food production, biofuels, and bioremediation. However, limited knowledge about these proteins hinders their exploration. This study addresses this gap by employing in silico methods utilizing computational tools and machine learning. We propose a novel approach to predict acidophilic proteins using protein language models (PLMs), accelerating discovery without extensive lab work. Our investigation highlights the potential of PLMs in understanding and harnessing acidophilic proteins for scientific and industrial advancements. We introduce the ACE model, which combines a simple Logistic Regression model with embeddings derived from protein sequences processed by the ProtT5 PLM. This model achieves high performance on an independent test set, with accuracy (0.91), F1-score (0.93), and Matthew's correlation coefficient (0.76). To our knowledge, this is the first application of pre-trained PLM embeddings for acidophilic protein classification. The ACE model serves as a powerful tool for exploring protein acidophilicity, paving the way for future advancements in protein design and engineering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification

Abstract

Talk to us

Similar Papers

More From: Computational Biology and Chemistry

Lead the way for us

Similar Papers

Identifying chronic disease patients using predictive algorithms in pharmacy administrative claims: an application in rheumatoid arthritis
Ervant J Maksabedian Hernandez ... Jessica Tiu
Journal of Medical Economics | VOL. 24
Ervant J Maksabedian Hernandez, et. al.Ervant J Maksabedian Hernandez ... Jessica Tiu
01 Jan 2020
Journal of Medical Economics | VOL. 24

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing
Emily Chia-Yu Su ... Ting-Yi Sung
BMC Bioinformatics | VOL. 13
Emily Chia-Yu Su, et. al.Emily Chia-Yu Su ... Ting-Yi Sung
01 Dec 2012
BMC Bioinformatics | VOL. 13

ISKIN: Integrated application of machine learning and Mondrian conformal prediction to detect skin sensitizers in cosmetic raw materials
Weikaixin Kong ... Chao Peng
SmartMat | VOL. -
Weikaixin Kong, et. al.Weikaixin Kong ... Chao Peng
15 Feb 2024
SmartMat | VOL. -

Selection of relevant features from amino acids enables development of robust classifiers
Rishi Das Roy ... Debasis Dash
Amino Acids | VOL. 46
Rishi Das Roy, et. al.Rishi Das Roy ... Debasis Dash
07 Mar 2014
Amino Acids | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification

Abstract

Talk to us

Similar Papers

More From: Computational Biology and Chemistry