Abstract

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category. This paper explores data augmentation -- a technique particularly suitable for training with limited data -- for this few-shot, highly-multiclass text classification setting. On four diverse text classification tasks, we find that common data augmentation techniques can improve the performance of triplet networks by up to 3.0% on average. To further boost performance, we present a simple training strategy called curriculum data augmentation, which leverages curriculum learning by first training on only original examples and then introducing augmented data as training progresses. We explore a two-stage and a gradual schedule, and find that, compared with standard single-stage training, curriculum data augmentation trains faster, improves performance, and remains robust to high amounts of noising from augmentation.

Highlights

  • In traditional text classification tasks, it has been shown that performance improvements can be marginal when training data is Traditional text classification tasks such as sentiment classification (Socher et al, 2013) typically have few output classes, each with many training examples. Many practical scenarios such as relation classification (Han et al, 2018), answer selection (Kumar et al, 2019), and sentence clustering (Mnasri et al, 2017), have a converse setup characterized by a large number of output classes (Gupta et al, 2014), often with few training examples per class

  • We hypothesize that the few-shot, highly-multiclass text classification scenario is a suitable context for data augmentation

  • We propose a simple curriculum learning setting in NLP applications and can be challenging strategy called curriculum data augmentation due to the scarcity of training data

Read more

Summary

Introduction

Traditional text classification tasks such as sentiment classification (Socher et al, 2013) typically have few output classes (e.g., in binary classification), each with many training examples Many practical scenarios such as relation classification (Han et al, 2018), answer selection (Kumar et al, 2019), and sentence clustering (Mnasri et al, 2017), have a converse setup characterized by a large number of output classes (Gupta et al, 2014), often with few training examples per class. This scenario, which we refer to as few-shot, sufficient, augmentation is especially beneficial in limited data scenarios (Xie et al, 2020). Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5493–5500

Curriculum Data Augmentation
Augmentation Techniques
Triplet Loss Model
Ablation
Related Work and Conclusions
For Various Augmentation Techniques
Findings
A Appendix
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.