Abstract

Named entity recognition (NER) is the task of natural language processing that recognizes a predefined entity name such as a person, place name, or organization into a token in a sentence. The NER is important because it significantly affects the performance of subsequent analyses including semantic search, question answering, and machine translation. The performance of NER for English has been greatly improved with an advent of deep learning techniques with a large dataset of English. However, only few studies have been conducted for languages spoken by ethnic minorities, such as Korean (i.e., Hangul), because an appropriate dataset for NER is difficult to obtain. In this study, we propose using various data augmentation techniques to improve the performance NER for Hangul datasets. Our methods can be applied without pre-trained models or external pre-building. We demonstrated the usefulness of the presented data augmentation techniques using a Changwon University-Naver Challenge dataset and found that even a small dataset can achieve a satisfactory performance for Hangul NER.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call