PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

Linh The Nguyen,Dat Quoc Nguyen

doi:10.18653/v1/2021.naacl-demos.1

Abstract

We present the first multi-task learning model – named PhoNLP – for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen, 2020) for each task independently. We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0. Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only Vietnamese but also the other languages. Our PhoNLP is available at https://github.com/VinAIResearch/PhoNLP

Highlights

Multi-task learning is a promising solution as it might help reduce the Vietnamese NLP research has been significantly explored recently. It has been boosted by the success of the national project on Vietnamese language and speech processing (VLSP) KC01.01/20062010 and VLSP workshops that have run shared tasks since 2013.1 Fundamental tasks of POS tagging, NER and dependency parsing play important roles, providing useful features for many downstream application tasks such as machine translation (Tran et al, 2016), sentiment analysis (Bang and Sornlertlamvanich, 2018), relation extraction (To and Do, 2020), semantic parsing (Nguyen et al, 2020), open information extraction (Truong et al, 2017) and question answering storage space
Given an input sentence of words to PhoNLP, an encoding layer generates contextualized word embeddings that represent the input words. These contextualized word embeddings are fed into a POS tagging layer that is a linear prediction layer (Devlin et al, 2019) to predict POS tags for the
Our PhoNLP can be viewed as an extension of previous joint POS tagging and dependency parsing models (Hashimoto et al, 2017; Li et al, 2018; Nguyen and Verspoor, 2018; Nguyen, 2019; Kondratyuk and Straka, 2019), where we incorporate a CRF-based prediction layer for NER

Summary

Introduction

Vietnamese NLP research has been significantly explored recently. It has been boosted by the success of the national project on Vietnamese language and speech processing (VLSP) KC01.01/20062010 and VLSP workshops that have run shared tasks since 2013.1 Fundamental tasks of POS tagging, NER and dependency parsing play important roles, providing useful features for many downstream application tasks such as machine translation (Tran et al, 2016), sentiment analysis (Bang and Sornlertlamvanich, 2018), relation extraction (To and Do, 2020), semantic parsing (Nguyen et al, 2020), open information extraction (Truong et al, 2017) and question answering storage space. Based on both the contextualized word embeddings and the 2 Model description “soft” POS tag embeddings, the NER layer uses a linear-chain CRF predictor (Lafferty et al, 2001) to predict NER labels for the input words, while the dependency parsing layer uses a Biaffine classifier (Dozat and Manning, 2017) to predict dependency arcs between the words and another Biaffine clas-. An objective loss LDEP is computed by summing a cross entropy loss for unlabeled dependency parsing and another cross entropy loss for where following Hashimoto et al (2017), the “soft” POS tag embedding t(i1) is computed by multiplying a label weight matrix W(1) with the corresponding probability vector pi: dependency label prediction during training based on gold arcs and arc labels. PhoNLP is the weighted sum of the POS tagging loss LPOS, the NER loss LNER and the dependency parsing loss LDEP:

Discussion

Dependency parsing

Implementation

Experiments

VinAI Np B-ORG 4 pob CH O

Findings

PhoNLP toolkit

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 12	License type: cc-by

Similar Papers

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing
...
-
, et. al. ...
25 May 2021
25 May 2021

Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition
Turdi Tohti ... Askar Hamdulla
Information | VOL. 13
Turdi Tohti, et. al.Turdi Tohti ... Askar Hamdulla
14 Dec 2022
Information | VOL. 13

Enhancing deep neural networks with morphological information
Matej Klemen ... Luka Krsnik
Natural language engineering | VOL. 29
Matej Klemen, et. al.Matej Klemen ... Luka Krsnik
21 Feb 2022
Natural language engineering | VOL. 29

IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
Divyanshu Kakwani ... Gokul N.C
-
Divyanshu Kakwani, et. al.Divyanshu Kakwani ... Gokul N.C
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers