Abstract

Nowadays natural language processing plays an important and critical role in the domain of intelligent computing, pattern recognition, semantic analysis and machine intelligence. For Chinese information processing, to construct the predictive models of different semantic word-formation patterns with a large-scale corpus can significantly improve the efficiency and accuracy of the paraphrase of the unregistered or new word, ambiguities elimination, automatic lexicography, machine translation and other applications. Therefore it is required to find the relationship between word-formation patterns and different influential factors, which can be denoted as a classification problem. However, due to noise, anomalies, imprecision, polysemy, ambiguity, nonlinear structure, and class-imbalance in semantic word-formation data, multi-criteria optimization classifier (MCOC), support vector machines (SVM) and other traditional classification approaches will give the poor predictive performance. In this paper, according to the characteristic analysis of Chinese word-formations, we firstly proposed a novel layered semantic graph of each disyllabic word, the layer-weighted graph edit distance (GED) and its similarity kernel embedded into a new vector space, then on the normalized data MCOC with kernel, fuzzification and penalty factors (KFP-MCOC) and SVM are employed to predict Chinese semantic word-formation patterns. Our experimental results and comparison with SVM show that KFP-MCOC based on the layer-weighted semantic graphs can increase the separation of different patterns, the predictive accuracy of target patterns and the generalization of semantic pattern classification on new compound words.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call