The Effect of Using Masked Language Models in Random Textual Data Augmentation

Mohammad Amin Rashid,Hossein Amirkhani

doi:10.1109/csicc52343.2021.9420616

Abstract

Powerful yet simple augmentation techniques have significantly helped modern deep learning-based text classifiers to become more robust in recent years. Although these augmentation methods have proven to be effective, they often utilize random or non-contextualized operations to generate new data. In this work, we modify a specific augmentation method called Easy Data Augmentation or EDA with more sophisticated text editing operations powered by masked language models such as BERT and RoBERTa to analyze the benefits or setbacks of creating more linguistically meaningful and hopefully higher quality augmentations. Our analysis demonstrates that using a masked language model for word insertion almost always achieves better results than the initial method but it comes at a cost of more time and resources which can be comparatively remedied by deploying a lighter and smaller language model like DistilBERT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Effect of Using Masked Language Models in Random Textual Data Augmentation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Self-Supervised Contextual Data Augmentation for Natural Language Processing
Dongju Park ... Chang Wook Ahn
Symmetry | VOL. 11
Dongju Park, et. al.Dongju Park ... Chang Wook Ahn
11 Nov 2019
Symmetry | VOL. 11

Conditional BERT Contextual Augmentation
Xing Wu ... Shangwen Lv
-
Xing Wu, et. al.Xing Wu ... Shangwen Lv
01 Jan 2019
01 Jan 2019

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi Liao ... Qun Liu
-
Yi Liao, et. al.Yi Liao ... Qun Liu
01 Jan 2020
01 Jan 2020

Self-supervised Fine-tuning for Efficient Passage Re-ranking
Meoungjun Kim ... Youngjoong Ko
-
Meoungjun Kim, et. al.Meoungjun Kim ... Youngjoong Ko
26 Oct 2021
26 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Effect of Using Masked Language Models in Random Textual Data Augmentation

Abstract

Talk to us

Similar Papers