Korean Erroneous Sentence Classification With Integrated Eojeol Embedding

Donghyun Choi,Eunggyun Kim,Ilnam Park,Dong Ryeol Shin,Myeongcheol Shin

doi:10.1109/access.2021.3085864

Donghyun Choi, Eunggyun Kim + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3085864

Copy DOI

Abstract

This paper attempts to analyze the Korean sentence classification system. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separated by space) Embedding to reduce the effect of poorly analyzed morphemes on sentence classification. The paper also proposes two noise insertion methods that further improve classification performance. Our evaluation results indicate that by applying the proposed methods on the existing sentence classifiers, the sentence classification accuracy on erroneous sentences is increased by 8% to 15%.

Highlights

Sentence classification is the task of classifying the given input text into one of the predefined categories
Two noise insertion methods are proposed to overcome the weakness of the eojeol embedding-based approaches and to add noises into training data automatically
The proposed system is evaluated against the Korean chatbot intent classification corpus and movie sentiment analysis corpus

Summary

INTRODUCTION

Sentence classification is the task of classifying the given input text into one of the predefined categories. Various researchers tokenize Korean input sentences into morphemes first with a morpheme analyzer, and the resultant morphemes are fed into a sentence classification system [5], [7], [8]. Such morpheme embedding-based approaches may work well on grammatically correct sentences, but the morpheme analyzers are likely to fail for erroneous sentences, leading to wrong classification results. A novel approach of Integrated Eojeol Embedding (IEE) is proposed to handle the erroneous Korean sentence classification problem.

RELATED WORK

MODELS

CORPUS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Korean Erroneous Sentence Classification With Integrated Eojeol Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Reliable automatic word spacing using a space insertion and correction model based on neural networks in Korean
Sihyung Kim ... Harksoo Kim
Information Processing and Management | VOL. 56
Sihyung Kim, et. al.Sihyung Kim ... Harksoo Kim
27 Feb 2019
Information Processing and Management | VOL. 56

Robust Sentence Classification by Solving Out-of-Vocabulary Problem with Auxiliary Word Predictor
Sang-Seok Park ... Seyoung Park
-
Sang-Seok Park, et. al.Sang-Seok Park ... Seyoung Park
01 Jan 2019
01 Jan 2019

Sentence representation and classification using attention and additional language information
Hongli Deng ... Lei Zhang
-
Hongli Deng, et. al.Hongli Deng ... Lei Zhang
01 Mar 2018
01 Mar 2018

Retrofitting Contextualized Word Embeddings with Paraphrases
Weijia Shi ... Pei Zhou
-
Weijia Shi, et. al.Weijia Shi ... Pei Zhou
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Korean Erroneous Sentence Classification With Integrated Eojeol Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions