Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Eunchan Lee,Changhyeon Lee,Sangtae Ahn

doi:10.3390/app12094522

Abstract

Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 29, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
...
-
, et. al. ...
11 May 2022
11 May 2022

Enhancing performance of transformer-based models in natural language understanding through word importance embedding
Seung-Kyu Hong ... Hyuk-Yoon Kwon
Knowledge-Based Systems | VOL. 304
Seung-Kyu Hong, et. al.Seung-Kyu Hong ... Hyuk-Yoon Kwon
23 Aug 2024
Knowledge-Based Systems | VOL. 304

KLEJ: Comprehensive Benchmark for Polish Language Understanding
Piotr Rybak ... Ireneusz Gawlik
-
Piotr Rybak, et. al.Piotr Rybak ... Ireneusz Gawlik
01 Jan 2020
01 Jan 2020

Arabic abstractive text summarization using RNN-based and transformer-based architectures
Mohammad Bani-Almarjeh ... Mohamad-Bassam Kurdy
Information Processing & Management | VOL. 60
Mohammad Bani-Almarjeh, et. al.Mohammad Bani-Almarjeh ... Mohamad-Bassam Kurdy
26 Dec 2022
Information Processing & Management | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Abstract

Talk to us

Similar Papers

More From: Applied Sciences