Effect of Character and Word Features in Bidirectional LSTM-CRF for NER

Chirawan Ronran,Seungwoo Lee

doi:10.1109/bigcomp48618.2020.00132

Abstract

Named entity recognition (NER) is a challenging task in natural language processing (NLP). Nowadays, the NLP research community pays attention to the NER system, adopting a well-known deep neural network (DNN) techniques to extract the named entity in a text. So, we studied the effect of the existing Glove and Fasttext word embedding to improve NER performance. We also examined the impact of a combination of additional word and character input features to CNN, Bidirectional LSTM with and without the CRF models. In the proposed work, we did not preprocess the data and did not use any lexicon for further enhancement. Our experiment mainly focuses on the effectiveness of the word and character features for NER. The F1 measure was used for comparing the effectiveness of our additional input features with Chui's word and character feature. To our knowledge, the best result is obtained using Glove 840B as word embedding along with word pattern, and character pattern input for CNN and two-layers of Bidirectional LSTM with CRF. The experiment achieves 91.10% on the CoNLL-2003 and outperforms Chiu state of the art (SOTA) results.

Full Text