Self-Supervised Contextual Data Augmentation for Natural Language Processing

Dongju Park,Chang Wook Ahn

doi:10.3390/sym11111393

Abstract

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.

Highlights

The rapid development of effective and efficient machine learning and deep learning has changed the paradigm of methodologies in various fields
We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text
We demonstrate that the proposed method boosts the classification accuracy of recurrent neural networks (RNNs) [10] and convolutional neural networks (CNNs) [30] based classifiers through various experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets

Summary

Introduction

The rapid development of effective and efficient machine learning and deep learning has changed the paradigm of methodologies in various fields. The neural network-based model provides exceptional performance in a variety of computer vision (CV) tasks including image classification [1], image generation [2], semantic segmentation [3], and object detection [4]. It performs well in natural language processing (NLP), such as machine translation [5], language modelling [6], question answering [7], sentiment analysis [8], and text classification [7].

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Nov 11, 2019
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Self-Supervised Contextual Data Augmentation for Natural Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Data Augmentation for Biomedical Factoid Question Answering
...
-
, et. al. ...
12 May 2022
12 May 2022

Conditional BERT Contextual Augmentation
Xing Wu ... Shangwen Lv
-
Xing Wu, et. al.Xing Wu ... Shangwen Lv
01 Jan 2019
01 Jan 2019

The Effect of Using Masked Language Models in Random Textual Data Augmentation
Mohammad Amin Rashid ... Hossein Amirkhani
-
Mohammad Amin Rashid, et. al.Mohammad Amin Rashid ... Hossein Amirkhani
03 Mar 2021
03 Mar 2021

Optimizing Data Augmentation for Semantic Segmentation on Small-Scale Dataset
Rui Ma ... Pin Tao
-
Rui Ma, et. al.Rui Ma ... Pin Tao
15 Jun 2019
15 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Contextual Data Augmentation for Natural Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry