Abstract

Few-shot learning under the <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot setting (i.e., <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> annotated samples for each of <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> classes) has been widely studied in relation extraction (e.g., FewRel) and image classification (e.g., Mini-ImageNet). Named entity recognition (NER) is typically framed as a sequence labeling problem where the entity classes are inherently entangled together because the entity number and classes in a sentence are not known in advance, leaving the <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot NER problem so far unexplored. In this paper, we first formally define a more suitable <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -way <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> -shot setting for NER. Then we propose <small>FewNER</small> , a novel meta-learning approach for few-shot NER. <small>FewNER</small> separates the entire network into a task-independent part and a task-specific part. During training in <small>FewNER</small> , the task-independent part is meta-learned across multiple tasks and the task-specific part is learned for each individual task in a low-dimensional space. At test time, <small>FewNER</small> keeps the task-independent part fixed and adapts to a new task via gradient descent by updating only the task-specific part, resulting in it being less prone to overfitting and more computationally efficient. Compared with pre-trained language models (e.g., BERT and ELMo) which obtain the transferability in an implicit manner (i.e., relying on large-scale corpora), <small>FewNER</small> explicitly optimizes the capability of “learning to adapt quickly” through meta-learning. The results demonstrate that <small>FewNER</small> achieves state-of-the-art performance against nine baseline methods by significant margins on three adaptation experiments (i.e., intra-domain cross-type, cross-domain intra-type and cross-domain cross-type).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call