Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction.

Zeyu Luo,Junhao Liu,Yawen Sun,Rui Wang,Yu-Juan Zhang,Zongqing Chen

doi:10.1093/bib/bbad534

Zeyu Luo, Junhao Liu + Show 4 more

Open Access

https://doi.org/10.1093/bib/bbad534

Copy DOI

Journal: Briefings in Bioinformatics	Publication Date: Jan 22, 2024
Citations: 3	License type: public-domain

Affiliation: Chongqing Normal University

Abstract

As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural design of models or differing strategies for fine-tuning, largely overlooking investigations into the nature of the features derived from LLMs. In this research, we propose different ESM2 representation extraction strategies, considering both the character type and position within the ESM2 input sequence. Using model dimensionality reduction, predictive analysis and interpretability techniques, we have illuminated potential associations between diverse feature types and specific subcellular localizations. Particularly, the prediction of Mitochondrion and Golgi apparatus prefer segments feature closer to the N-terminal, and phosphorylation site-based features could mirror phosphorylation properties. We also evaluate the prediction performance and interpretability robustness of Random Forest and Deep Neural Networks with varied feature inputs. This work offers novel insights into maximizing LLMs' utility, understanding their mechanisms, and extracting biological domain knowledge. Furthermore, we have made the code, feature extraction API, and all relevant materials available at https://github.com/yujuan-zhang/feature-representation-for-LLMs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics

Lead the way for us

Similar Papers

Efficient feature extraction with simultaneous recurrent network for metric learning
M Alam ... L Vidyaratne
-
M Alam, et. al.M Alam ... L Vidyaratne
01 Jul 2016
01 Jul 2016

Large Language Models are Good Translators
Zhaohan Zeng ... Zhibin Liang
Journal of Emerging Investigators | VOL. -
Zhaohan Zeng, et. al.Zhaohan Zeng ... Zhibin Liang
01 Jan 2024
Journal of Emerging Investigators | VOL. -

Chapter 4 - Feature Extraction and Dimension Reduction
Abdulhamit Subasi
Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques | VOL. -
Abdulhamit SubasiAbdulhamit Subasi
01 Jan 2019
Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques | VOL. -

The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
Moschoula Pternea ... Mirco Milletari
Journal of Artificial Intelligence Research | VOL. 80
Moschoula Pternea, et. al.Moschoula Pternea ... Mirco Milletari
26 Aug 2024
Journal of Artificial Intelligence Research | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics