Applications of transformer-based language models in bioinformatics: a survey.

Shuang Zhang,Yuti Liu,Qiao Liu,Rui Fan,Shuang Chen,Wanwen Zeng,Alex Bateman

doi:10.1093/bioadv/vbad001

Shuang Zhang, Yuti Liu + Show 5 more

Open Access

https://doi.org/10.1093/bioadv/vbad001

Copy DOI

Journal: Bioinformatics advances	Publication Date: Jan 5, 2023
Citations: 26	License type: CC BY 4.0

Affiliation: Nankai University, Stanford University

Abstract

The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural language processing (NLP). Since there are inherent similarities between various biological sequences and natural languages, the remarkable interpretability and adaptability of these models have prompted a new wave of their application in bioinformatics research. To provide a timely and comprehensive review, we introduce key developments of transformer-based language models by describing the detailed structure of transformers and summarize their contribution to a wide range of bioinformatics research from basic sequence analysis to drug discovery. While transformer-based applications in bioinformatics are diverse and multifaceted, we identify and discuss the common challenges, including heterogeneity of training data, computational expense and model interpretability, and opportunities in the context of bioinformatics research. We hope that the broader community of NLP researchers, bioinformaticians and biologists will be brought together to foster future research and development in transformer-based language models, and inspire novel bioinformatics applications that are unattainable by traditional methods. Supplementary data are available at Bioinformatics Advances online.

Full Text