CoTea: Collaborative teaching for low-resource named entity recognition with a divide-and-conquer strategy

Zhiwei Yang,Jing Ma,Kang Yang,Huiru Lin,Hechang Chen,Ruichao Yang,Yi Chang

doi:10.1016/j.ipm.2024.103657

Abstract

Low-resource named entity recognition (NER) aims to identify entity mentions when training data is scarce. Recent approaches resort to distant data with manual dictionaries for improvement, but such dictionaries are not always available for the target domain and have limited coverage of entities, which may introduce noise. In this paper, we propose a novel Collaborative Teaching (CoTea) framework for low-resource NER with a few supporting labeled examples, which can automatically augment training data and reduce label noise. Specifically, CoTea utilizes the entities in the supporting labeled examples to retrieve entity-related unlabeled data heuristically and then generates accurate distant labels with a novel mining-refining iterative mechanism. For optimizing distant labels, the mechanism mines potential entities from non-entity tokens with a recognition teacher and then refines entity labels with another prompt-based discrimination teacher in a divide-and-conquer manner. Experimental results on two benchmark datasets demonstrate that CoTea outperforms state-of-the-art baselines in low-resource settings and achieves 85% and 65% performance levels of the best high-resource baseline methods by merely utilizing about 2% of labeled data.

Full Text