Abstract

This paper attempts to analyze the Korean sentence classification system. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separated by space) Embedding to reduce the effect of poorly analyzed morphemes on sentence classification. The paper also proposes two noise insertion methods that further improve classification performance. Our evaluation results indicate that by applying the proposed methods on the existing sentence classifiers, the sentence classification accuracy on erroneous sentences is increased by 8% to 15%.

Highlights

  • Sentence classification is the task of classifying the given input text into one of the predefined categories

  • Two noise insertion methods are proposed to overcome the weakness of the eojeol embedding-based approaches and to add noises into training data automatically

  • The proposed system is evaluated against the Korean chatbot intent classification corpus and movie sentiment analysis corpus

Read more

Summary

INTRODUCTION

Sentence classification is the task of classifying the given input text into one of the predefined categories. Various researchers tokenize Korean input sentences into morphemes first with a morpheme analyzer, and the resultant morphemes are fed into a sentence classification system [5], [7], [8]. Such morpheme embedding-based approaches may work well on grammatically correct sentences, but the morpheme analyzers are likely to fail for erroneous sentences, leading to wrong classification results. A novel approach of Integrated Eojeol Embedding (IEE) is proposed to handle the erroneous Korean sentence classification problem.

RELATED WORK
MODELS
CORPUS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call