Abstract
In some languages, Named Entity Recognition (NER) is severely hindered by complex linguistic structures, such as inflection, that will confuse the data-driven models when perceiving the word’s actual meaning. This work tries to alleviate these problems by introducing a novel neural network based on morphological and syntactic grammars. The experiments were performed in four Nordic languages, which have many grammar rules. The model was named the NorG network (Nor: Nordic Languages, G: Grammar). In addition to learning from the text content, the NorG network also learns from the word writing form, the POS tag, and dependency. The proposed neural network consists of a bidirectional Long Short-Term Memory (Bi-LSTM) layer to capture word-level grammars, while a bidirectional Graph Attention (Bi-GAT) layer is used to capture sentence-level grammars. Experimental results from four languages show that the grammar-assisted network significantly improves the results against baselines. We also investigate how the NorG network works on each grammar component by some exploratory experiments.
Highlights
Machine Learning models have widely applied Natural Language Processing (NLP)techniques, which replace the previous rule-based models and show better performances
Most leading Named Entity Recognition (NER) models are based on BERT [9], a type of word embedding pretrained by the Transformer architecture [25]
Our model was evaluated in the NorNE (Norwegian Bokmål), NorNE (Norwegian Nynorsk), DaNE (Danish), and Turku NER (Finnish) datasets whose linguistic structures are annotated in CONLL-U format
Summary
Techniques, which replace the previous rule-based models and show better performances. Named Entity Recognition (NER) is a type of NLP technique based on machine learning models that extracts entities from sentences [2]. NER has seen considerable development in English, and many data-driven models have been proposed. Compared with English, some languages have many linguistic structures. Aiming at these grammar rules, this work proposes a grammar-based network for named entity recognition and selected four Nordic languages in experiments. (3) Experimental results demonstrate the effectiveness of the proposed method in four languages and some exploratory experiments were conducted to discover the influences of different grammar components on the NER performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have