SNER-CS: Self-training Named Entity Recognition in Computer Science

Jing-Jing Zhu,Xian-Ling Mao,Heyan Huang

doi:10.1088/1742-6596/2506/1/012007

Jing-Jing Zhu, Xian-Ling Mao + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/2506/1/012007

Copy DOI

Abstract

As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

Full Text