Abstract

For Chinese information processing, automatic classification based on a large-scale database for different patterns of semantic word-formation can remarkably improve the identification for the unregistered word, automatic lexicography, semantic analysis, and other applications. However, owing to noise, anomalies, nonlinear characteristics, class-imbalance, and other uncertainties in word-formation data, the predictive performance of multi-criteria optimization classifier (MCOC) and other traditional data mining approaches will rapidly degenerate. In this paper we put forward an novel MCOC with fuzzification, kernel, and penalty factors (FKP-MCOC) based on layered and weighted graph edit distance (GED): firstly the layered and weighted GEDs between each semantic word-formation graph and prototype graphs are calculated and used for the dissimilarity measure, then the normalized GEDs are embedded into a new feature vector space, and FKP-MCO classifier based on the feature vector space is built for predicting the patterns of semantic word-formation. Our experimental results of Chinese word-formation analysis and comparison with support vector machine (SVM) show that our proposed approach can increase the separation of different patterns, the predictive performance of semantic pattern of a new compound word.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call