Abstract

Imbalanced data constitute an extensively studied problem in the field of machine learning classification because they result in poor training outcomes. Data augmentation is a method for increasing minority class diversity. In the field of text data augmentation, easy data augmentation (EDA) is used to generate additional data that would otherwise lack diversity and exhibit monotonic sentence patterns. Generative adversarial network (GAN) models can generate diverse sentence patterns by using the probability corresponding to each word in a language model. Therefore, hybrid EDA and GAN models can generate highly diverse and appropriate sentence patterns. This study proposes a hybrid framework that employs a generative adversarial network and Shapley algorithm based on easy data augmentation (HEGS) to improve classification performance. The experimental results reveal that the HEGS framework can generate highly diverse training sentences to form balanced text data and improve text classification performance for minority classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call