Application of BERT to Enable Gene Classification Based on Clinical Evidence.

Yuhan Su,Yong Yu,Haotian Xie,Na Zhao,Shiyan Dong,Zhaogang Yang,Hongxin Xiang

doi:10.1155/2020/5491963

Abstract

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F-measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

Highlights

Nowadays, genomic, transcriptomic, and epigenomic studies have been benefited from the development of inexpensive next-generation sequencing technologies, which play essential roles in exploring tumor biology [1,2,3]
Parameters of Bidirectional Encoder Representations from Transformers (BERT)-base methods are loaded into the downstream BERT classification model so that our model parameters can be fine-tuned based on these pretrained models, which significantly reduces the convergence time of the model and increases the accuracy of the model
This paper evaluates the performances of the model using several evaluation indicators: Logloss, recall (REC), precision (PRE), F1 score, receiver operating characteristic (ROC) curve, and confusion matrix

Summary

Introduction

Genomic, transcriptomic, and epigenomic studies have been benefited from the development of inexpensive next-generation sequencing technologies, which play essential roles in exploring tumor biology [1,2,3]. The advanced ML methods, such as Light Gradient Boosting Machine (LightGBM), has been proposed to enable gene multiclassification based on complex literature [25]. These methods are limited by complex calculations when applied to large-scale datasets, for genomic-related literature datasets that contain millions, or billions, of annotated training examples [26, 27]. The performances of ML are dependent on feature extraction that requires professional knowledge and longterm processing [28,29,30,31] To overcome these difficulties, deep learning (DL) has emerged to handle large-scale and complex datasets since its performance increases with the enlargement of datasets [32,33,34]. We improve the BERT method to classify complex clinical texts, and obtain 0.8074 logarithmic loss, 0.6837 recall, and 0.705 F-measure scores

Problem Statement

Materials and Methods

Transformer

Experiments

Result

Conclusion

Conflicts of Interest

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioMed Research International	Publication Date: Oct 7, 2020
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioMed Research International

Lead the way for us

Similar Papers

A Multilabel Text Classifier of Cancer Literature at the Publication Level: Methods Study of Medical Text Classification.
Ying Zhang ... Aihua Li
JMIR medical informatics | VOL. 11
Ying Zhang, et. al.Ying Zhang ... Aihua Li
05 Oct 2023
JMIR medical informatics | VOL. 11

BioBERT and Similar Approaches for Relation Extraction.
Balu Bhasuran
Methods in molecular biology (Clifton, N.J.) | VOL. 2496
Balu BhasuranBalu Bhasuran
01 Jan 2021
Methods in molecular biology (Clifton, N.J.) | VOL. 2496

Abstract 2101: Deep learning for automatic extraction of tumor site and histology from unstructured pathology reports
Ross Mitchell ... Katie Fellows
Cancer Research | VOL. 80
Ross Mitchell, et. al.Ross Mitchell ... Katie Fellows
13 Aug 2020
Abstract 2101: Deep learning for automatic extraction of tumor site and histology from unstructured pathology reports
Ross Mitchell ... Katie Fellows

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
Kun Zeng ... Likeng Liang
BMC Medical Informatics and Decision Making | VOL. 21
Kun Zeng, et. al.Kun Zeng ... Likeng Liang
01 Jul 2021
BMC Medical Informatics and Decision Making | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioMed Research International