Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT

Aarathi Rajagopalan Nair,Rimjhim Padam Singh,Deepa Gupta,Priyanka Kumar

doi:10.1016/j.procs.2024.04.013

Abstract

Data augmentation entails artificially expanding the dataset's size by applying various transformations to the existing raw data. Enhancing the quality and quantity of the datasets with varying sizes by employing varieddata augmentation techniques has immense importance in the field on Natural Language Processing. Several notable applications for instance text classification, sentiment analysis, text summarization, etc. have proven to be benefitted immensely with the employment of text augmentation techniques. Hence, the paper focuses on efficient text classification using varied datasets of different sizes; small- 500 instances, medium-5564 instances and large-43934 instances.The work considers the standard DistilBERT model, a popular transformer-based language model and presents the impact on the performance of the modelafter employing different text augmentation techniques. The study specifically focuses on three augmentation methods: (a) Synonym augmentation:that involves replacing words with their synonyms to enhance vocabulary diversity and generalization, (b) Contextual word embeddings that enriches semantic understanding by leveraging pre-trained language models, and (c) Black translation that entails translating the text into a another different language and then translating it back, introducing variations in the data and capturing different linguistic patterns.Additionally,the work also discusses the combined effect of employing all three augmentation techniques simultaneously. Moreover, the study also aims compares the relation between the dataset sizes and the performance of the augmentation techniques. The study considers three standard datasets for the analysis and presents a comprehensive analysis using accuracy and F1 score as evaluation metrics. The results highlight the efficacy of each technique across small, medium, and large datasets, enabling a nuanced understanding of their benefits in different data scenarios. The findings indicate the varying degrees of improvement achieved through each augmentation technique.The enhancement achieved by applying text augmentation varied from around 2% on large datasets to 20% on smaller datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

Advancing NLP models with strategic text augmentation: A comprehensive study of augmentation methods and curriculum strategies
Himmet Toprak Kesgin ... Mehmet Fatih Amasyali
Natural Language Processing Journal | VOL. 7
Himmet Toprak Kesgin, et. al.Himmet Toprak Kesgin ... Mehmet Fatih Amasyali
13 Apr 2024
Natural Language Processing Journal | VOL. 7

Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia–Ukraine War Using Transformers
Mahmud Hasan ... Labiba Islam
Vietnam Journal of Computer Science | VOL. 10
Mahmud Hasan, et. al.Mahmud Hasan ... Labiba Islam
04 May 2023
Vietnam Journal of Computer Science | VOL. 10

Guide for the application of the data augmentation approach on sets of texts in Spanish for sentiment and emotion analysis.
Rodrigo Gutiérrez Benítez ... Claudia Martínez-Araneda
PloS one | VOL. 19
Rodrigo Gutiérrez Benítez, et. al.Rodrigo Gutiérrez Benítez ... Claudia Martínez-Araneda
01 Jan 2024
PloS one | VOL. 19

GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation
Aytuğ Onan
Expert Systems with Applications | VOL. 232
Aytuğ OnanAytuğ Onan
24 Jun 2023
Expert Systems with Applications | VOL. 232

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science