A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

Tomoya Iwakura,Hiroya Takamura,Manabu Okumura

doi:10.1145/2499955.2499958

Abstract

We propose a named entity (NE) recognition method in which word chunks are repeatedly decomposed and concatenated. Our method identifies word chunks with a base chunker, such as a noun phrase chunker, and then recognizes NEs from the recognized word chunk sequences. By using word chunks, we can obtain features that cannot be obtained in word-sequence-based recognition methods, such as the first word of a word chunk, the last word of a word chunk, and so on. However, each word chunk may include a part of an NE or multiple NEs. To solve this problem, we use the following operators: SHIFT for separating the first word from a word chunk, POP for separating the last word from a word chunk, JOIN for concatenating two word chunks, and REDUCE for assigning an NE label to a word chunk. We evaluate our method on a Japanese NE recognition dataset that includes about 200,000 annotations of 191 types of NEs from over 8,500 news articles. The experimental results show that the training and processing speeds of our method are faster than those of a linear-chain structured perceptron and a semi-Markov perceptron, while maintaining high accuracy.

Full Text