구어체 적응 사전 학습을 통한 한국어 감정 분류 성능 향상

Junghoon Lee,Youngbin Ro,Donghwa Kim,Pilsung Kang

doi:10.7232/jkiie.2021.47.4.342

Abstract

Language models (LMs) pretrained on a large text corpus and fine-tuned on a task data have a remarkable performance for document classification task. Recently, an adaptive pretraining method that re-pretrains the pretrained LMs using an additional dataset in the same domain with the given task to make up the domain discrepancy has reported significant performance improvements. However, current adaptive pretraining methods only focus on the domain gap between pretraining data and fine-tuning data. The writing style is also different because the pretraining data, e.g., Wikipedia, is written in a literary style, but the task data, e.g., customer review, is usually written in a colloquial style. In this work, we propose a colloquial-adaptive pretraining method that re-pretrains the pretrained LM with informal sentences to generalize the LM to colloquial style. We verify the proposed method based on multi-emotion classification datasets. The experimental results show that the proposed method yields improved classification performance on both low- and high-resource data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

구어체 적응 사전 학습을 통한 한국어 감정 분류 성능 향상

Abstract

Talk to us

Similar Papers

More From: Journal of the Korean Institute of Industrial Engineers

Lead the way for us

Similar Papers

Using Similarity Measures to Select Pretraining Data for
Xiang Dai ... Sarvnaz Karimi
-
Xiang Dai, et. al.Xiang Dai ... Sarvnaz Karimi
01 Jan 2019
01 Jan 2019

Training neural network language models on very large corpora
Holger Schwenk ... Jean-Luc Gauvain
-
Holger Schwenk, et. al.Holger Schwenk ... Jean-Luc Gauvain
01 Jan 2004
01 Jan 2004

How much pretraining data do language models need to learn syntax?
...
-
, et. al. ...
15 Oct 2021
15 Oct 2021

How much pretraining data do language models need to learn syntax?
Laura Pérez-Mayos ... Miguel Ballesteros
-
Laura Pérez-Mayos, et. al.Laura Pérez-Mayos ... Miguel Ballesteros
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

구어체 적응 사전 학습을 통한 한국어 감정 분류 성능 향상

Abstract

Talk to us

Similar Papers

More From: Journal of the Korean Institute of Industrial Engineers