Abstract
Kernel-based data transformation model for nonlinear classification of symbolic data
Highlights
Symbolic data, alternatively known as categorical data or nominal data, are widely used in real-world applications, where the attributes are represented by symbols, which are qualitative category of things [1]
In the K2 NN algorithm [38], which is an extension to the conventional k-nearest neighbors (KNN) classifier, a weighted simple matching (SM) distance measure was derived based on the kernel density estimation (KDE) on symbolic data; in [39], three new linear classifiers were defined for symbolic data classification and, interestingly, it was demonstrated that the classes can be more separable by kernel learning of symbolic attributes
This subsection aims at deriving an Support Vector Machine (SVM) for non-linear classification of symbolic data, named SVM-S, using our new data transformation model KDTM and the inner product or distance measure formulated in the previous subsections
Summary
Alternatively known as categorical data or nominal data, are widely used in real-world applications, where the attributes are represented by symbols, which are qualitative category of things [1]. A number of methods have been developed to classify symbolic data, including decision trees (DT), Naive Bayes (NB) [9] and distance-based methods such as the k-nearest neighbors (KNN) and the prototype-based classifiers [10, 11] Since both DT and NB are typically based on the assumption that symbolic attributes are conditionally independent given the class attribute, they cannot identify the non-linear correlation between attributes, which has been validated to be useful in high-quality classification [12, 13]. The non-linear Support Vector Machine (SVM) [18] makes use of Mercer kernel functions to embed raw objects into a reproducing kernel Hilbert space, such that the data can be classified in the new space with high-quality Such a method cannot be directly applied to non-linear symbolic data classification, because, essentially, it is designed for numeric data, where the Mercer kernels and some key intermediate operations, such as inner product, are well-defined. Popular solution to this problem is to transform symbolic data into numeric data as a preprocessing, using a frequency estimation-based encoding model such as the well-known One-Hot
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have