Domain-specific Named Entity Recognition with Document-Level Optimization

Limin Wang,Shoushan Li,Guodong Zhou,Qian Yan

doi:10.1145/3213544

Abstract

Previous studies normally formulate named entity recognition (NER) as a sequence labeling task and optimize the solution in the sentence level. In this article, we propose a document-level optimization approach to NER and apply it in a domain-specific document-level NER task. As a baseline, we apply a state-of-the-art approach, i.e., long-short-term memory (LSTM), to perform word classification. On this basis, we define a global objective function with the obtained word classification results and achieve global optimization via Integer Linear Programming (ILP). Specifically, in the ILP-based approach, we propose four kinds of constraints, i.e., label transition, entity length, label consistency, and domain-specific regulation constraints, to incorporate various entity recognition knowledge in the document level. Empirical studies demonstrate the effectiveness of the proposed approach to domain-specific document-level NER.

Full Text