Abstract
Abstract Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.