Abstract

Sequence labeling, such as part-of-speech (POS) tagging, named entity recognition (NER), text chunking, is a classic task in natural language processing. Most existing neural networks models for sequence labeling are based on recurrent neural networks. Recently, convolutional neural networks have been proposed to replace the recurrent components for sequence labeling. However, they are usually shallow compared to deep convolutional networks that achieve start-of-the-art performance in other fields. Due to the vanishing gradient problem, these models usually can not work well when simply increasing the number of layers. In this paper, we propose using deep CNN architecture in sequence labeling, which can capture a large context through stacked convolutions. To reduce the vanishing gradient problem, the proposed method incorporates gated linear units, residual connections, and dense connections. Experimental results on three sequence labeling tasks show that the proposed model can achieve competitive performance to the RNN-based state-of-the-art method while maintaining $2.41\times$ faster speed, even with up to 10 convolutional layers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call