Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Wenjie Fan,Shuiqing Huang,Dongbo Wang

doi:10.1093/llc/fqad016

Abstract

Abstract There exists no sentence boundary in most classical Chinese literature texts. Since it is difficult to read literature of this kind, experts in literature or linguistics would segment the sentence manually. This article explores the effectiveness of classical Chinese sentence segmentation method so as to provide a reference for classical Chinese punctuation. On the basis of the machine learning methods, we chose three components of machine learning, namely models, tagging schemes, and features, to compare the learning results. The models include conditional random field (CRF) models, long short term memory (LSTM) models, BiLSTM–CRF models, and three Bidirectional Encoder Representation from Transformers (BERT) models. There are five tagging schemes in this article and three features including the statistical feature, Guangyun, and Fanqie. Finally, the performance of the combined feature template is evaluated by ten-fold cross-validation on four classical Chinese texts in different genres. The SikuBERT model is proved to be the most effective model for sentence segmentation at present. Different tagging schemes and various features are introduced. The results show that 5-tag-J tagging schemes can improve performance. Statistical feature, as an important clue for classical Chinese sentence segmentation, is useful in related tasks, but Guangyun and Fanqie have little impact. Other important factors of sentence segmentation are genres and writing styles.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities

Lead the way for us

Similar Papers

Do readers adjust their lower‐ and higher‐level language skills according to text structures? Evidence from eye movements in Chinese text reading
Minglei Chen ... Chiahsing Chen
Journal of Research in Reading | VOL. 43
Minglei Chen, et. al.Minglei Chen ... Chiahsing Chen
26 Jan 2020
Journal of Research in Reading | VOL. 43

Conversational Style and Omissions in Classical Chinese and Their Implications for Classical Chinese Grammar Pedagogy
Sue-Mei Wu
-
Sue-Mei WuSue-Mei Wu
30 Sep 2022
30 Sep 2022

Research on Chinese Semantic Named Entity Recognition in Marine Engine Room Systems Based on BERT
Henglong Shen ... Guangxi Sun
Journal of Marine Science and Engineering | VOL. 11
Henglong Shen, et. al.Henglong Shen ... Guangxi Sun
21 Jun 2023
Journal of Marine Science and Engineering | VOL. 11

One Improved Model of Named Entity Recognition by Combining BERT and BiLSTM-CNN for Domain of Chinese Railway Construction
Xiaojun Wu ... Sheng Yuan
-
Xiaojun Wu, et. al.Xiaojun Wu ... Sheng Yuan
15 Apr 2022
15 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities