GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation

Aytuğ Onan

doi:10.1016/j.eswa.2023.120908

Abstract

Text augmentation is a popular technique in natural language processing (NLP) that has been shown to improve the performance of various downstream tasks. The goal of text augmentation is to generate additional training data from existing data, thereby increasing the amount of data available for training machine learning models. In this paper, we propose a novel approach to text augmentation that combines the power of graph-based neural networks and genetic algorithms. Our proposed scheme, called GTR-GA, is designed to generate diverse and high-quality augmented text data by exploring the high-dimensional feature space of text data using graph-based neural networks and genetic algorithms. Our approach utilizes a graph attention network (GAT) based model, called HetGAPN, to learn node representations in a heterogeneous graph representing text data. We introduce the concept of node aggregation and bidirectional attention in HetGAPN to better capture the relationships between different features of the input text data. We then use a genetic algorithm to generate a diverse set of candidate embeddings that explore different parts of the input space. Our approach uses perplexity as the objective function to evaluate the quality of the augmented text data generated by each candidate embedding. The most fit candidate embeddings are selected as parents, and crossover and mutation operators are used to generate a new population of candidate embeddings. Our experimental results show that GTR-GA is capable of generating high-quality augmented text data that improves the performance of various downstream NLP tasks, such as sentiment analysis and text classification. The proposed scheme can be applied to a wide range of NLP tasks and can help overcome the data scarcity problem that is often encountered in NLP.

Full Text