GeDa: Improving training data with large language models for Aspect Sentiment Triplet Extraction

Weixing Mai,Zhengxuan Zhang,Yifan Chen,Kuntao Li,Yun Xue

doi:10.1016/j.knosys.2024.112289

Abstract

Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-based Sentiment Analysis (ABSA). Recently, ASTE methods have achieved promising results. However, the performance of ASTE models is restricted to both the quantity and the quality of training data. As such, challenges lie in collecting valuable data and selecting targeted data for diversified ASTE model architecture. To this end, we propose a novel General Data-Centric Framework (GeDa), which is capable of improving the training data for ASTE models accurately and efficiently. Specifically, two types of prompts are designed to guide large language models in synthetic candidates synthesizing for ASTE task. Then, the Characteristic-Driven Iterative Strategy is put forward to optimize the interaction between the model and the training data. The data is iteratively selected from the synthetic candidates, aiming to improve the quantity and the quality of training data. With multiple iterations, a targeted training set can be obtained to benefit ASTE model learning. Extensive experiments reveal that ASTE models with GeDa reach a more than 5% increment on average F1 by adding only a small amount of training data.

Full Text