Part-of-Speech Tagging with Rule-Based Data Preprocessing and Transformer

Hongwei Li,Hongyan Mao,Jingzi Wang

doi:10.3390/electronics11010056

Abstract

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). POS tagging for a word depends not only on the word itself but also on its position, its surrounding words, and their POS tags. POS tagging can be an upstream task for other NLP tasks, further improving their performance. Therefore, it is important to improve the accuracy of POS tagging. In POS tagging, bidirectional Long Short-Term Memory (Bi-LSTM) is commonly used and achieves good performance. However, Bi-LSTM is not as powerful as Transformer in leveraging contextual information, since Bi-LSTM simply concatenates the contextual information from left-to-right and right-to-left. In this study, we propose a novel approach for POS tagging to improve the accuracy. For each token, all possible POS tags are obtained without considering context, and then rules are applied to prune out these possible POS tags, which we call rule-based data preprocessing. In this way, the number of possible POS tags of most tokens can be reduced to one, and they are considered to be correctly tagged. Finally, POS tags of the remaining tokens are masked, and a model based on Transformer is used to only predict the masked POS tags, which enables it to leverage bidirectional contexts. Our experimental result shows that our approach leads to better performance than other methods using Bi-LSTM.

Highlights

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP)
Considering the power of the Transformer, we propose to build a model for POS tagging based on Transformer
The accuracy is jointly determined by the rule-based data preprocessing and the model

Summary

Introduction

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). It assigns a POS tag to each word in a given sentence. For a short and simple sentence “I like dogs”, a POS tagger can identify the word I as a pronoun, the word like as a verb, and the word dogs as a noun. Some words in complex sentences are difficult to tag correctly by POS taggers. POS tagging can be an upstream task for other NLP tasks, such as semantic parsing [1], machine translation [2], and relation extraction [3], to improve their performance. Improving the accuracy of POS tagging becomes an important goal

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Dec 24, 2021
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Part-of-Speech Tagging with Rule-Based Data Preprocessing and Transformer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM
Wasan Alkhwiter ... Nora Al-Twairesh
Computer Speech & Language | VOL. 65
Wasan Alkhwiter, et. al.Wasan Alkhwiter ... Nora Al-Twairesh
31 Jul 2020
Computer Speech & Language | VOL. 65

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
Ying Xiong ... Qingcai Chen
BMC Medical Informatics and Decision Making | VOL. 19
Ying Xiong, et. al.Ying Xiong ... Qingcai Chen
01 Apr 2019
BMC Medical Informatics and Decision Making | VOL. 19

Part of Speech Tagging for Tamil Language Using Deep Learning
Hemakasiny Visuwalingam ... Ratnasingam Sakuntharaj
-
Hemakasiny Visuwalingam, et. al.Hemakasiny Visuwalingam ... Ratnasingam Sakuntharaj
12 Sep 2021
12 Sep 2021

End to End Parts of Speech Tagging and Named Entity Recognition in Bangla Language
Jillur Rahman Saurav ... Farida Chowdhury
-
Jillur Rahman Saurav, et. al.Jillur Rahman Saurav ... Farida Chowdhury
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Part-of-Speech Tagging with Rule-Based Data Preprocessing and Transformer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics