AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network

Xinyu Wang,Tao Wang,Yong Jiang,Nguyen Bach,Kewei Tu,Fei Huang,Zhongqiang Huang

doi:10.18653/v1/2020.emnlp-main.485

Abstract

The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms such as the forward-backward and Viterbi algorithms are typically applied in training and prediction stages of the CRF model. However, these algorithms require sequential computation that makes parallelization impossible. In this paper, we propose to employ a parallelizable approximate variational inference algorithm for the CRF model. Based on this algorithm, we design an approximate inference network that can be connected with the encoder of the neural CRF model to form an end-to-end network, which is amenable to parallelization for faster training and prediction. The empirical results show that our proposed approaches achieve a 12.7-fold improvement in decoding speed with long sentences and a competitive accuracy compared with the traditional CRF approach.

Highlights

Sequence labeling assigns each token with a label in a sequence
In order to speed up the training and prediction time of the Conditional Random Field (CRF) layer, we propose the approximate inference network (AIN), which is a neural network derived from Mean-Field Variational Inference (MFVI) for approximate decoding in linear-chain CRF
Speed We report the relative speed improvements over the CRF model based on our PyTorch (Paszke et al, 2019) implementation run on a GPU server with Nvidia Tesla V100

Summary

Introduction

Sequence labeling assigns each token with a label in a sequence. Tasks such as Named Entity Recognition (NER) (Sundheim, 1995), Part-OfSpeech (POS) tagging (DeRose, 1988) and chunking (Tjong Kim Sang and Buchholz, 2000) can all be formulated as sequence labeling tasks. BiLSTMCRF (Huang et al, 2015; Lample et al, 2016; Ma and Hovy, 2016) is one of the most successful neural sequence labeling architectures. It feeds pretrained (contextual) word representations into a single layer bi-directional LSTM (BiLSTM) encoder to extract contextual features and feeds. The CRF layer, is more difficult to replace because of its superior accuracy compared with faster alternatives in many tasks

Methods

Results

Discussion

Conclusion