Abstract

This paper introduces and explains in detail the overall information and tutorial about the commonly used Chinese morphological analyzers (e.g. ICTCLAS, Jieba, Stanford CoreNLP) which are employed in Chinese preprocessing tasks of Chinese Word Segmentation (CWS) and Part-of-speech tagging. In particular, the usability of the tools was enhanced by developing simple executables distributed to linguistic researchers unfamiliar with coding, along with rich execution examples in GUI and CLI environments. Plus, by introducing the unique features and functions of each morphological analyzer, it was recommended the most suitable analyzer tailored to the needs of individual researchers. As a guide for Chinese morphological analysis, which is inevitably accompanied by data-driven quantitative research, this study presents practical tools and useful guidelines for Chinese text preprocessing to researchers who want to expand their research interests to corpus linguistics, computational linguistics, and natural language processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call