A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.

Nguyen Quoc Khanh Le,Quang-Thai Ho,Yu-Yen Ou,Trinh-Trung-Duong Nguyen

doi:10.1093/bib/bbab005

Abstract

Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-based multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as natural sentences and then used BERT models to transform them into fixed-length numerical matrices. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-based features improved more than 5-10% in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks; CNN) holds potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D CNNs could open a new avenue in biological modeling using sequence information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics

Lead the way for us

Journal: Briefings in Bioinformatics	Publication Date: Feb 5, 2021
Citations: 110

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Jia Li ... Wenjuan Liu
BMC Medical Informatics and Decision Making | VOL. 22
Jia Li, et. al.Jia Li ... Wenjuan Liu
30 Jul 2022
BMC Medical Informatics and Decision Making | VOL. 22

Korean clinical entity recognition from diagnosis text using BERT
Young-Min Kim ... Tae-Hoon Lee
BMC Medical Informatics and Decision Making | VOL. 20
Young-Min Kim, et. al.Young-Min Kim ... Tae-Hoon Lee
01 Sep 2020
BMC Medical Informatics and Decision Making | VOL. 20

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models
Syed Muazzam Ali Shah ... Yu-Yen Ou
Computers in Biology and Medicine | VOL. 131
Syed Muazzam Ali Shah, et. al.Syed Muazzam Ali Shah ... Yu-Yen Ou
07 Feb 2021
Computers in Biology and Medicine | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics