Abstract

We present a new toolkit NiuParser for Chinese syntactic and semantic analysis. It can handle a wide range of Natural Language Processing (NLP) tasks in Chinese, including word segmentation, partof-speech tagging, named entity recognition, chunking, constituent parsing, dependency parsing, and semantic role labeling. The NiuParser system runs fast and shows state-of-the-art performance on several benchmarks. Moreover, it is very easy to use for both research and industrial purposes. Advanced features include the Software Development Kit (SDK) interfaces and a multi-thread implementation for system speed-up.

Highlights

  • Chinese has been one of the most popular world languages for years

  • Several systems have been developed for Chinese word segmentation, partof-speech tagging and syntactic parsing though some of them are not optimized for Chinese

  • We trained and tested word segmentation, POS tagging, chunking, and constituent parsing on CTB5.1: articles 001-270 and 440-1151 were used for training and articles 271-300 were used for testing

Read more

Summary

Introduction

Chinese has been one of the most popular world languages for years. Due to its complexity and diverse underlying structures, processing this language is a challenging issue and has been clearly an important part of Natural Language Processing (NLP). Many tasks are proposed to analyze and understand Chinese, ranging from word segmentation to syntactic and/or semantic parsing, which can benefit a wide range of natural language applications. Several systems have been developed for Chinese word segmentation, partof-speech tagging and syntactic parsing (examples include Stanford CoreNLP1, FudanNLP2, LTP3 and etc.) though some of them are not optimized for Chinese. The NiuParser toolkit can handle most of Chinese parsing-related tasks, including word segmentation, part-of-speech tagging, named entity recognition, chunking, constituent parsing, dependency parsing, and semantic role labeling. All subsystems in NiuParser are based on statistical models and are learned automatically from data We optimize these systems for Chinese in several ways, including handcrafted rules used in pre/post-processing, heuristics used in various algorithms, and a number of tuned features. Proceedings of ACL-IJCNLP 2015 System Demonstrations, pages 145–150, Beijing, China, July 26-31, 2015. c 2015 ACL and AFNLP

What is NiuParser
Sequence Labeling
Transition-based Parsing
Two-Stage Classification
Word Segmentation
Named Entity Recognition
System Speed-up
Experiments
Findings
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call