Abstract

This paper presents a method of extracting location names from Chinese texts based on support vector machine (SVM) and K nearest neighbors (KNN). The character itself, character-based part-of-speech (POS) tag, the information whether a character appears in the location name characteristic word table and context information are extracted as the features of the vectors. A model based on SVM is set up for extracting location names. To improve the accuracy of SVM classifier, KNN algorithm is introduced; furthermore, to fit the unbalanced data, a modified SVM-KNN classifier is proposed. The experimental results show that this model is efficient in identifying location names from Chinese texts. The recall, precision and F-measure are up to 90.38%, 92.12% and 91.24% respectively in open test. The hybrid machine learning model based on SVM and KNN can be used for recognizing location names and other unknown words such as person names and organization names in Chinese texts. The modified SVM-KNN model can be generalized to the fields of machine learning with unbalanced class distribution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.