Abstract

AbstractDetecting named entities from documents is one of the most important tasks in knowledge engineering. Previous studies rely on annotated training data, which is quite expensive to obtain large training data sets, limiting the effectiveness of recognition. In this research, we propose a semi-supervised learning approach for named entity recognition (NER) via automatic labeling and tritraining which make use of unlabeled data and structured resources containing known named entities. By modifying tri-training for sequence labeling and deriving proper initialization, we can train a NER model for Web news articles automatically with satisfactory performance. In the task of Chinese personal name extraction from 8,672 news articles on the Web (with 364,685 sentences and 54,449 (11,856 distinct) person names), an F-measure of 90.4% can be achieved.KeywordsNamed entity extractionco-labeling methodtri-training

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.