Abstract
Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category. This paper explores data augmentation -- a technique particularly suitable for training with limited data -- for this few-shot, highly-multiclass text classification setting. On four diverse text classification tasks, we find that common data augmentation techniques can improve the performance of triplet networks by up to 3.0% on average. To further boost performance, we present a simple training strategy called curriculum data augmentation, which leverages curriculum learning by first training on only original examples and then introducing augmented data as training progresses. We explore a two-stage and a gradual schedule, and find that, compared with standard single-stage training, curriculum data augmentation trains faster, improves performance, and remains robust to high amounts of noising from augmentation.
Highlights
In traditional text classification tasks, it has been shown that performance improvements can be marginal when training data is Traditional text classification tasks such as sentiment classification (Socher et al, 2013) typically have few output classes, each with many training examples. Many practical scenarios such as relation classification (Han et al, 2018), answer selection (Kumar et al, 2019), and sentence clustering (Mnasri et al, 2017), have a converse setup characterized by a large number of output classes (Gupta et al, 2014), often with few training examples per class
We hypothesize that the few-shot, highly-multiclass text classification scenario is a suitable context for data augmentation
We propose a simple curriculum learning setting in NLP applications and can be challenging strategy called curriculum data augmentation due to the scarcity of training data
Summary
Traditional text classification tasks such as sentiment classification (Socher et al, 2013) typically have few output classes (e.g., in binary classification), each with many training examples Many practical scenarios such as relation classification (Han et al, 2018), answer selection (Kumar et al, 2019), and sentence clustering (Mnasri et al, 2017), have a converse setup characterized by a large number of output classes (Gupta et al, 2014), often with few training examples per class. This scenario, which we refer to as few-shot, sufficient, augmentation is especially beneficial in limited data scenarios (Xie et al, 2020). Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5493–5500
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.