Few-Shot Named Entity Recognition: An Empirical Baseline Study

Jiaxin Huang,Baolin Peng,Shobana Balakrishnan,Weizhu Chen,Jiawei Han,Krishan Subudhi,Damien Jose,Jianfeng Gao,Chunyuan Li

doi:10.18653/v1/2021.emnlp-main.813

Abstract

This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available. Based upon recent Transformer-based self-supervised pre-trained language models (PLMs), we investigate three orthogonal schemes to improve model generalization ability in few-shot settings: (1) meta-learning to construct prototypes for different entity types, (2) task-specific supervised pre-training on noisy web data to extract entity-related representations and (3) self-training to leverage unlabeled in-domain data. On 10 public NER datasets, we perform extensive empirical comparisons over the proposed schemes and their combinations with various proportions of labeled data, our experiments show that (i)in the few-shot learning setting, the proposed NER schemes significantly improve or outperform the commonly used baseline, a PLM-based linear classifier fine-tuned using domain labels. (ii) We create new state-of-the-art results on both few-shot and training-free settings compared with existing methods.

Highlights

Named Entity Recognition (NER) involves processing unstructured text, locating and classifying named entities into particular categories of pre-defined entity types, such as persons, organizations, locations, medical codes, dates and quantities
To deal with the challenge of few-shot learning, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10408–10423 November 7–11, 2021. c 2021 Association for Computational Linguistics we focus on improving the generalization ability of are that only a small number of labeled examples
It dependency transfer mechanism to transfer label shows that the prototype-based method only yields dependency information from source domains to better results when there is very limited labeled target domains. (iii) SimBERT is a simple baseline data: the size of both entity types and examples are reported in (Yang and Katiyar, 2020; Hou et al, small. (iii) When comparing columns 5 with 1 2020); it utilizes a nearest neighbor classifier based, we observe that on the contextualized representation output by the using self-training consistently works better than di- pre-trained BERT, without fine-tuning on few-shot rectly fine-tuning with labeled data only, suggesting examples

Summary

Introduction

Named Entity Recognition (NER) involves processing unstructured text, locating and classifying named entities (certain occurrences of words or expressions) into particular categories of pre-defined entity types, such as persons, organizations, locations, medical codes, dates and quantities. Even with these PLMs, building NER systems still remains a labor-intensive, timeconsuming task It requires rich domain knowledge and expert experience to annotate a large corpus of in-domain labeled tokens to teach the models to achieve a reasonable accuracy. Y = [y1, y2, ..., yT ], where y ∈ Y is a one-hot data settings, and explore three orthogonal direcvector indicating the entity type of each token from tions shown in Figure 1: (i) How to adapt metaa pre-defined discrete label space The training learning such as prototype-based methods for fewdataset for NER often consists of pair-wise data DL = {(Xn, Yn)}Nn=1, where N is the number of training examples.

Methods

Results

Conclusion