Abstract

Named entities carry essential meanings and information in natural language. Therefore, Named Entity Recognition has many applications in different Natural Language Processing tasks such as Information Retrieval, Information Extraction, Machine Translation, and Question Answering. State-of-the-art Named Entity Recognition systems are based on supervised machine learning algorithms which require huge amounts of training data. The main problem, however, is that constructing named entity annotated corpora is an expensive, labor-intensive, and time-consuming task. Therefore, in this paper, we propose an approach to improve monolingual Named Entity Recognition systems by exploiting an existing unannotated English-Chinese bilingual corpus. The system jointly recognizes named entities in both English and Chinese sentences through the use of bilingual constraints. Experimental results show an improvement in Named Entity Recognition of both Chinese and English compared to the strong baseline StanfordNER. In particular, Chinese Named Entity Recognition improves significantly by 20.81% in term of F1-score. As for the English language, Named Entity Recognition F1-score increases slightly from 75.75% to 76.08%. When comparing to the state-of-the-art system in improving Named Entity Recognition based on bilingual resources, we manage to outperform in Chinese Named Entity Recognition task by 5.99% and achieve comparable results for the English side. Our proposed method can also be generalized to apply to resource-limited languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call