Chinese Personal Name Disambiguation Based on Clustering

Chao Fan,Yu Li

doi:10.1155/2021/3790176

Abstract

Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part‐of‐speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf‐idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chinese Personal Name Disambiguation Based on Clustering

Abstract

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Journal: Wireless Communications and Mobile Computing	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Exploring personal name disambiguation from name understanding
Ying Chen ... Chu-Ren Huang
-
Ying Chen, et. al.Ying Chen ... Chu-Ren Huang
01 Oct 2010
01 Oct 2010

A Pre-Identification Method for Chinese Named Entity Recognition
Hongjian Liu ... Quan Zhou
Journal of Software | VOL. 5
Hongjian Liu, et. al.Hongjian Liu ... Quan Zhou
01 Jan 2009
Journal of Software | VOL. 5

How to Distinguish and Catalog Chinese Personal Names
Qianli Hu
Cataloging & Classification Quarterly | VOL. 19
Qianli HuQianli Hu
26 Aug 1994
Cataloging & Classification Quarterly | VOL. 19

Chinese Person Name Recognition Based on Naive Bayes
Hui Zeng ... Jun Wang
-
Hui Zeng, et. al.Hui Zeng ... Jun Wang
01 Nov 2014
01 Nov 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chinese Personal Name Disambiguation Based on Clustering

Abstract

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing