Abstract

We propose a sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset. This language modeling objective incentivises the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks. The architecture was evaluated on a range of datasets, covering the tasks of error detection in learner texts, named entity recognition, chunking and POS-tagging. The novel language modeling objective provided consistent performance improvements on every benchmark, without requiring any additional annotated or unannotated data.

Highlights

  • Accurate and efficient sequence labeling models have a wide range of applications, including named entity recognition (NER), part-of-speech (POS) tagging, error detection and shallow parsing

  • The proposed architecture was evaluated on 10 different sequence labeling datasets, covering the tasks of error detection, NER, chunking, and POStagging

  • We performed experiments on the development data where the value of γ was gradually decreased, but found that a small static value performed comparably well or even better. These experiments indicate that the language modeling objective helps the network learn general-purpose features that are useful for sequence labeling even in the later stages of training

Read more

Summary

Introduction

Accurate and efficient sequence labeling models have a wide range of applications, including named entity recognition (NER), part-of-speech (POS) tagging, error detection and shallow parsing. Recent work has shown that neural network architectures are able to achieve comparable or improved performance, while automatically discovering useful features for a specific task and only requiring a sequence of tokens as input (Collobert et al, 2011; Irsoy and Cardie, 2014; Lample et al, 2016) This feature discovery is usually driven by an objective function based on predicting the annotated labels for each word, without much incentive to learn more general language features from the available text. This secondary unsupervised objective encourages the framework to learn richer features for semantic composition without requiring additional training data. This multitask training framework gives the largest improvements on error detection datasets, outperforming the previous state-of-the-art architecture

Neural Sequence Labeling
Language Modeling Objective
Evaluation Setup
Error Detection
NER and Chunking
POS tagging
Related Work
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call