Active Learning-Based Approach for Named Entity Recognition on Short Text Streams

Cuong Van Tran,Dinh Tuyen Hoang,Tuong Tri Nguyen,Ngoc Thanh Nguyen,Dosam Hwang

doi:10.1007/978-3-319-43982-2_28

Abstract

The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline.

Full Text