Chinese question entity discovery and linking (QEDL) may encounter short texts and small-scale annotated datasets, which may invalidate certain machine learning algorithms. In this paper, we propose a progressive joint framework for Chinese QEDL, which leverages the mutual dependency information of these two tasks to enhance the performance with each other. The framework uses the candidate entity generation (CEG) of entity linking to iteratively augment the overall process of entity discovery that consists of mention generation, filtering and merging modules. In mention generation module, to reduce the hand-crafted effort of the rule-based entity discovery, we develop a question representation method to generate domain-independent entity discovery rules, and use CEG to check the extracted mentions in priority order. This module can embed extracted mentions into other entity discovery methods as one feature or as extra mentions to alleviate insufficiencies of annotated datasets. The mentions filtering module leverages the joint features of extracted mentions and CEG’s entities to build a voting model and filter out low-confidence mentions. Moreover, the mentions merging module merges different patterns’ mention-entity pairs and check their corresponding candidate entities with CEG. During entity linking, we incorporate the joint features of questions, extracted mentions and CEG’s entities into a ranking model for entity disambiguation. Finally, we conduct experiments on two real datasets and compare our approach with other state-of-the-art methods. The results illustrate that the proposed framework can reduce error accumulation and flexibly combine different entity discovery methods, which significantly improves the performance on small-scale datasets.
Read full abstract