Abstract

Virtual adversarial training (VAT) is a powerful technique to improve model robustness in both supervised and semi-supervised settings. It is effective and can be easily adopted on lots of image classification and text classification tasks. However, its benefits to sequence labeling tasks such as named entity recognition (NER) have not been shown as significant, mostly, because the previous approach can not combine VAT with the conditional random field (CRF). CRF can significantly boost accuracy for sequence models by putting constraints on label transitions, which makes it an essential component in most state-of-the-art sequence labeling model architectures. In this paper, we propose SeqVAT, a method which naturally applies VAT to sequence labeling models with CRF. Empirical studies show that SeqVAT not only significantly improves the sequence labeling performance over baselines under supervised settings, but also outperforms state-of-the-art approaches under semi-supervised settings.

Highlights

  • While having achieved great success on various computer vision and natural language processing tasks, deep neural networks, even state-of-the-art models, are usually vulnerable to tiny input perturbations (Szegedy et al, 2014; Goodfellow et al, 2015)

  • Our evaluation demonstrates that SeqVAT brings significant improvements in supervised settings, rather than marginal improvements reported from previous virtual adversarial training (VAT)-based approaches Clark et al

  • We adapt the neural-conditional random field (CRF) architecture by a CNNLSTM-CRF model, which consists of one convolutional neural network (CNN) layer to generate character embeddings, two layers of bidirectional long short-term memory (LSTM) as the encoder and a CRF layer as the decoder

Read more

Summary

Introduction

While having achieved great success on various computer vision and natural language processing tasks, deep neural networks, even state-of-the-art models, are usually vulnerable to tiny input perturbations (Szegedy et al, 2014; Goodfellow et al, 2015). To apply VAT on sequence labeling, Clark et al (2018) proposed to use a softmax layer on the top of token representations to obtain label probability distributions for each token In this fashion, VAT could take KL divergence between tokens at the same position of the original sequence and the adversarial sequence as the adversarial losses. To apply the conventional VAT on a model with CRF, one can calculate the KL divergence on the label distribution of each token between the original examples and adversarial examples It is sub-optimal because the transition probabilities are not taken into account. In the semi-supervised settings, SeqVAT outperforms many widely used methods such as self-training (ST) (Yarowsky, 1995) and entropy minimization (EM) (Grandvalet and Bengio, 2004), as well as the state-of-the-art semisupervised sequence labeling algorithm, cross-view training (CVT) (Clark et al, 2018)

Sequence Labeling
Semi-Supervised Learning
Virtual Adversarial Training
Method
Model Architecture
Word Embeddings
Character CNN Layer
CRF Layer
Adversarial Training
SeqVAT
Training with Adversarial Loss
Experiment Settings
Dataset
Supervised Sequence Labeling
Semi-Supervised Sequence Labeling
K-best Selection in SeqVAT
Impact of Unlabeled Data
Comparison on Semi-Supervised Approaches
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call