ProteinGLUE multi-task benchmark suite for self-supervised protein modeling.

Henriette Capel,Maurits Dijkstra,Robin Weiler,Reinier Vleugels,Peter Bloem,K Anton Feenstra

doi:10.1038/s41598-022-19608-4

Henriette Capel, Maurits Dijkstra + Show 4 more

Open Access

https://doi.org/10.1038/s41598-022-19608-4

Copy DOI

Journal: Scientific Reports	Publication Date: Sep 26, 2022
Citations: 8	License type: CC BY 4.0

Affiliation: Vrije Universiteit Amsterdam

Abstract

Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ProteinGLUE multi-task benchmark suite for self-supervised protein modeling.

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

RNA Structure Analysis: A Multifaceted Approach
Bruce A Shapiro ... Wojciech Kasprzak
-
Bruce A Shapiro, et. al.Bruce A Shapiro ... Wojciech Kasprzak
02 Dec 1999
02 Dec 1999

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.
Frank Eisenhaber ... Cornelius Frömmel
Proteins | VOL. 25
Frank Eisenhaber, et. al.Frank Eisenhaber ... Cornelius Frömmel
01 Jun 1996
Proteins | VOL. 25

Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach
Minu Kesheri ... Rajeshwar Prasad Sinha
-
Minu Kesheri, et. al.Minu Kesheri ... Rajeshwar Prasad Sinha
30 Nov 2014
30 Nov 2014

RCPred: RNA complex prediction as a constrained maximum weight clique problem
Audrey Legendre ... Eric Angel
BMC bioinformatics | VOL. 20
Audrey Legendre, et. al.Audrey Legendre ... Eric Angel
01 Mar 2019
BMC bioinformatics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ProteinGLUE multi-task benchmark suite for self-supervised protein modeling.

Abstract

Talk to us

Similar Papers

More From: Scientific Reports