Abstract
This paper analyzes the hierarchical structures of named entities in the NIKL Named Entity Corpus, which is annotated with 553,830 flat named entity tags. This study will be a base for developing a method to build a Korean nested named entity corpus. The flat version of named entity recognition identifies mentions as linear spans. The nested named entity approach analyzes the hierarchical internal structure of named entities which may consist of smaller component named entities. We extracted candidate mentions for the nested named entity analysis from the NIKL Named Entity Corpus and classified them into three categories: serial named entities, complex named entities, and phrases with a named entity head. These candidates were reviewed manually to be selected as the target of nested named entity analysis. Finally, we discussed the span and the internal structure of named entities and proposed principles and guidelines for the construction of the Korean nested named entity corpus.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have