Learning the Morphological and Syntactic Grammars for Named Entity Recognition

Mengtao Sun,Qiang Yang,Ibrahim A Hameed,Mark Pasquine,Hao Wang

doi:10.3390/info13020049

Mengtao Sun, Qiang Yang + Show 3 more

Open Access

https://doi.org/10.3390/info13020049

Copy DOI

Abstract

In some languages, Named Entity Recognition (NER) is severely hindered by complex linguistic structures, such as inflection, that will confuse the data-driven models when perceiving the word’s actual meaning. This work tries to alleviate these problems by introducing a novel neural network based on morphological and syntactic grammars. The experiments were performed in four Nordic languages, which have many grammar rules. The model was named the NorG network (Nor: Nordic Languages, G: Grammar). In addition to learning from the text content, the NorG network also learns from the word writing form, the POS tag, and dependency. The proposed neural network consists of a bidirectional Long Short-Term Memory (Bi-LSTM) layer to capture word-level grammars, while a bidirectional Graph Attention (Bi-GAT) layer is used to capture sentence-level grammars. Experimental results from four languages show that the grammar-assisted network significantly improves the results against baselines. We also investigate how the NorG network works on each grammar component by some exploratory experiments.

Highlights

Machine Learning models have widely applied Natural Language Processing (NLP)techniques, which replace the previous rule-based models and show better performances
Most leading Named Entity Recognition (NER) models are based on BERT [9], a type of word embedding pretrained by the Transformer architecture [25]
Our model was evaluated in the NorNE (Norwegian Bokmål), NorNE (Norwegian Nynorsk), DaNE (Danish), and Turku NER (Finnish) datasets whose linguistic structures are annotated in CONLL-U format

Summary

Introduction

Techniques, which replace the previous rule-based models and show better performances. Named Entity Recognition (NER) is a type of NLP technique based on machine learning models that extracts entities from sentences [2]. NER has seen considerable development in English, and many data-driven models have been proposed. Compared with English, some languages have many linguistic structures. Aiming at these grammar rules, this work proposes a grammar-based network for named entity recognition and selected four Nordic languages in experiments. (3) Experimental results demonstrate the effectiveness of the proposed method in four languages and some exploratory experiments were conducted to discover the influences of different grammar components on the NER performance.

Related Works

Materials and Methods

NorG Embedding

Bi-LSTM Layer

Bi-GAT Layer

CRF Layer

NER Datasets

Norwegian Bokmål and Nynorsk

Danish

Finnish

Baselines

Hyperparameters of the NorG Network

Results

Main Results

Ablation Experiments

Performance against Sentence Length

Performance on Automatically Obtained Grammars

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Jan 20, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Learning the Morphological and Syntactic Grammars for Named Entity Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory
Joan Santoso ... Mauridhi Hery Purnomo
Expert systems with applications | VOL. 176
Joan Santoso, et. al.Joan Santoso ... Mauridhi Hery Purnomo
13 Mar 2021
Expert systems with applications | VOL. 176

End to End Parts of Speech Tagging and Named Entity Recognition in Bangla Language
Jillur Rahman Saurav ... Summit Haque
-
Jillur Rahman Saurav, et. al.Jillur Rahman Saurav ... Summit Haque
01 Sep 2019
01 Sep 2019

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
Ying Xiong ... Qingcai Chen
BMC medical informatics and decision making | VOL. 19
Ying Xiong, et. al.Ying Xiong ... Qingcai Chen
01 Apr 2019
BMC medical informatics and decision making | VOL. 19

A multi-head adjacent attention-based pyramid layered model for nested named entity recognition
Shengmin Cui ... Inwhee Joe
Neural Computing & Applications | VOL. 35
Shengmin Cui, et. al.Shengmin Cui ... Inwhee Joe
01 Sep 2022
Neural Computing & Applications | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning the Morphological and Syntactic Grammars for Named Entity Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information