Extracting Location Names from Chinese Texts Based on SVM and KNN

Lishuang Li Lishuang Li,Degen Huang Degen Huang,Tingting Mao Tingting Mao

doi:10.1109/nlpke.2005.1598764

Abstract

This paper presents a method of extracting location names from Chinese texts based on support vector machine (SVM) and K nearest neighbors (KNN). The character itself, character-based part-of-speech (POS) tag, the information whether a character appears in the location name characteristic word table and context information are extracted as the features of the vectors. A model based on SVM is set up for extracting location names. To improve the accuracy of SVM classifier, KNN algorithm is introduced; furthermore, to fit the unbalanced data, a modified SVM-KNN classifier is proposed. The experimental results show that this model is efficient in identifying location names from Chinese texts. The recall, precision and F-measure are up to 90.38%, 92.12% and 91.24% respectively in open test. The hybrid machine learning model based on SVM and KNN can be used for recognizing location names and other unknown words such as person names and organization names in Chinese texts. The modified SVM-KNN model can be generalized to the fields of machine learning with unbalanced class distribution.

Full Text