Computational Thematic Analysis of Poetry via Bimodal Large Language Models

Kahyun Choi

doi:10.1002/pra2.812

Abstract

ABSTRACTThis article proposes a multilabel poem topic classification algorithm utilizing large language models and auxiliary data to address the lack of diverse metadata in digital poetry libraries. The study examines the potential of context‐dependent language models, specifically bidirectional encoder representations from transformers (BERT), for understanding poetic words and utilizing auxiliary data, such as author's notes, in supplementing poetry text. The experimental results demonstrate that the BERT‐based model outperforms the traditional support vector machine‐based model across all input types and datasets. We also show that incorporating notes as an additional input improves the performance of the poem‐only model. Overall, the study suggests pretrained context‐dependent language models and auxiliary data have potential to enhance the accessibility of various poems within collections. This research can eventually assist in promoting the discovery of underrepresented poems in digital libraries, even if they lack associated metadata, thus enhancing the understanding and appreciation of the literary form.

Full Text