Chinese cross document co-reference resolution based on SVM classification and semantics

Zhiwei Zhao,Jinghang Gu,Longhua Qian,Guodong Zhou,Yanan Hu

doi:10.3724/sp.j.1087.2013.00984

Abstract

The task of Cross-Document Co-reference Resolution(CDCR) aims to merge those words distributed in different texts which refer to the same entity together to form co-reference chains.The traditional research on CDCR addresses name disambiguation posed in information retrieval using clustering methods.This paper transformed CDCR as a classification problem by using an Support Vector Machine(SVM) classifier to resolve both name disambiguation and variant consolidation,both of which were prevalent in information extraction.This method can effectively integrate various features,such as morphological,phonetic,and semantic knowledge collected from the corpus and the Internet.The experiment on a Chinese cross-document co-reference corpus shows the classification method outperforms clustering methods in both precision and recall.

Full Text