Exploring Thematic Diversity in Classical Chinese Poetry: A Novel Dataset and a BERT-enhanced Ensemble Learning Approach

Jingrui Hou,Shitou Zhang

doi:10.1145/3685679

Abstract

Classical Chinese poetry, as an essential aspect of cultural heritage, exhibits rich theme diversity often overlooked in natural language processing research. To address this gap, we aim to explore the classification of thematic categories within this literary domain. We curate a dataset of 2,918 annotated poems spanning seven common themes and propose a BERT-based ensemble learning approach for effective classification. Although this method integrates existing models, it achieves an accuracy and F1 score of over 72% in the 7-class task, surpassing established baselines, and providing a baseline for future research. The experimental findings reveal the effectiveness of ensemble strategies in improving individual base model performance and highlight the potential of the MLP-based ensemble technique. The study contributes to a deeper understanding of thematic categories and textual features in classical Chinese poetry, and offers an automated classification system for classical Chinese poems.

Full Text