An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

Jiaao Chen,Diyi Yang,Derek Tam,Colin Raffel,Mohit Bansal

doi:10.1162/tacl_a_00542

Abstract

AbstractNLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant time, money, or expertise is required to label massive amounts of textual data. Recently, data augmentation methods have been explored as a means of improving data efficiency in NLP. To date, there has been no systematic empirical overview of data augmentation for NLP in the limited labeled data setting, making it difficult to understand which methods work in which settings. In this paper, we provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting, summarizing the landscape of methods (including token-level augmentations, sentence-level augmentations, adversarial augmentations, and hidden-space augmentations) and carrying out experiments on 11 datasets covering topics/news classification, inference tasks, paraphrasing tasks, and single-sentence tasks. Based on the results, we draw several conclusions to help practitioners choose appropriate augmentations in different settings and discuss the current challenges and future directions for limited data learning in NLP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Mar 14, 2023
Citations: 32	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Data Augmentation for Building Footprint Segmentation in SAR Images: An Empirical Study
Sandhi Wangiyana ... Piotr Samczyński
Remote Sensing | VOL. 14
Sandhi Wangiyana, et. al.Sandhi Wangiyana ... Piotr Samczyński
22 Apr 2022
Remote Sensing | VOL. 14

Optimizing Data Augmentation for Semantic Segmentation on Small-Scale Dataset
Rui Ma ... Pin Tao
-
Rui Ma, et. al.Rui Ma ... Pin Tao
15 Jun 2019
15 Jun 2019

A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks
Binyan Hu ... Yu Sun
-
Binyan Hu, et. al.Binyan Hu ... Yu Sun
18 Jul 2022
18 Jul 2022

Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
Guanlin Li ... Tiejun Zhao
-
Guanlin Li, et. al.Guanlin Li ... Tiejun Zhao
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics