Abstract

As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call