Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Weiwei Sun,Xiaojun Wan

doi:10.1162/coli_a_00253

Abstract

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Linguistics	Publication Date: Sep 1, 2016
Citations: 25	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Similar Papers

Correlation and Prediction of Syntagmatic and Paradigmatic Relations to Academic Reading Comprehension Among Tertiary Level EFL Learners
...
Research in Applied Linguistics | VOL. 11
, et. al. ...
01 Sep 2020
Research in Applied Linguistics | VOL. 11

VOCABULARY DEPTH KNOWLEDGE AND ACADEMIC READING COMPREHENSION OF BUSINESS EFL UNDERGRADUATES: A CORRELATIONAL DESIGN STUDY
Md Kamrul Hasan ... Mekhala Chakma
Humanities & Social Sciences Reviews | VOL. 8
Md Kamrul Hasan, et. al.Md Kamrul Hasan ... Mekhala Chakma
27 Aug 2020
Humanities & Social Sciences Reviews | VOL. 8

Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models
Gabriella Lapesa ... Stefan Evert
-
Gabriella Lapesa, et. al.Gabriella Lapesa ... Stefan Evert
01 Jan 2014
01 Jan 2014

Paradigmatic relations and syntagmatic relations: How are they related?
Wanying Chiu ... Kun Lu
Proceedings of the Association for Information Science and Technology | VOL. 52
Wanying Chiu, et. al.Wanying Chiu ... Kun Lu
01 Jan 2015
Proceedings of the Association for Information Science and Technology | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics