PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction.

Chakradhar Guntuboina,Seongwon Kim,Parisa Mollaei,Amir Barati Farimani,Adrita Das

doi:10.1021/acs.jpclett.3c02398

Chakradhar Guntuboina, Seongwon Kim + Show 3 more

Open Access

https://doi.org/10.1021/acs.jpclett.3c02398

Copy DOI

Abstract

Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide's potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The journal of physical chemistry letters	Publication Date: Nov 13, 2023
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction.

Abstract

Talk to us

Similar Papers

More From: The journal of physical chemistry letters

Lead the way for us

Similar Papers

Fine-Tuning Language Models For Semi-Supervised Text Mining
Xinyu Chen ... Ian Beaver
-
Xinyu Chen, et. al.Xinyu Chen ... Ian Beaver
10 Dec 2020
10 Dec 2020

Representation Learning for Stack Overflow Posts: How Far Are We?
Junda He ... Ting Zhang
ACM Transactions on Software Engineering and Methodology | VOL. 33
Junda He, et. al.Junda He ... Ting Zhang
15 Mar 2024
ACM Transactions on Software Engineering and Methodology | VOL. 33

Large-scale chemical language representations capture molecular structure and properties
Jerret Ross ... Payel Das
Nature Machine Intelligence | VOL. 4
Jerret Ross, et. al.Jerret Ross ... Payel Das
21 Dec 2022
Nature Machine Intelligence | VOL. 4

A survey of GPT-3 family large language models including ChatGPT and GPT-4
Katikapalli Subramanyam Kalyan
Natural Language Processing Journal | VOL. 6
Katikapalli Subramanyam KalyanKatikapalli Subramanyam Kalyan
19 Dec 2023
Natural Language Processing Journal | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction.

Abstract

Talk to us

Similar Papers

More From: The journal of physical chemistry letters