Exploring the potential of data augmentation in poetry generation with small-scale corpora

Renxiang Huang

doi:10.54254/2755-2721/52/20241204

Abstract

Poetry generation is a complex task in the field of natural language processing, especially when working with small datasets. Data augmentation techniques have been shown to be an effective way to improve the performance of deep learning models in various tasks, including image classification and speech recognition. Therefore, this study focuses on exploring the impact of four different data augmentation methods - Synonym Replacement, Random Insertion, Random Swap, and Random Deletion - on the performance of poetry generation with a small poetry dataset. The results of the study reveal that Random Insertion performed well in terms of Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and manual evaluation when compared to other data augmentation techniques. Synonym Replacement performed poorly in all three evaluations. This study confirms the potential value of data augmentation technology in poetry generation tasks and provides innovative perspectives and directions for future research in this area. Data augmentation can be employed to help address the problem of limited data in poetry generation tasks and enhance the efficiency of deep learning models. Future research could focus on exploring more advanced data augmentation techniques and their impact on poetry generation tasks.

Full Text