Abstract

The process of creating high-quality labeled data is crucial for training machine-learning models, but it can be a time-consuming and labor-intensive process. Moreover, manual annotation by human annotators can lead to varying degrees of competency, training, and experience, which can result in inconsistent labeling and arbitrary standards. To address these challenges, researchers have been exploring automated methods for enhancing training and testing datasets. This paper proposes SRL-ACO, a novel text augmentation framework that leverages Semantic Role Labeling (SRL) and Ant Colony Optimization (ACO) techniques to generate additional training data for natural language processing (NLP) models. The framework uses SRL to identify the semantic roles of words in a sentence and ACO to generate new sentences that preserve these roles. SRL-ACO can enhance the accuracy of NLP models by generating additional data without requiring manual data annotation. The paper presents experimental results demonstrating the effectiveness of SRL-ACO on seven text classification datasets for sentiment analysis, toxic text detection and sarcasm identification. The results show that SRL-ACO improves the performance of a classifier on different NLP tasks. These results demonstrate that SRL-ACO has the potential to enhance the quality and quantity of training data for various NLP tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call