Abstract

Kernel-based data transformation model for nonlinear classification of symbolic data

Highlights

  • Symbolic data, alternatively known as categorical data or nominal data, are widely used in real-world applications, where the attributes are represented by symbols, which are qualitative category of things [1]

  • In the K2 NN algorithm [38], which is an extension to the conventional k-nearest neighbors (KNN) classifier, a weighted simple matching (SM) distance measure was derived based on the kernel density estimation (KDE) on symbolic data; in [39], three new linear classifiers were defined for symbolic data classification and, interestingly, it was demonstrated that the classes can be more separable by kernel learning of symbolic attributes

  • This subsection aims at deriving an Support Vector Machine (SVM) for non-linear classification of symbolic data, named SVM-S, using our new data transformation model KDTM and the inner product or distance measure formulated in the previous subsections

Read more

Summary

Introduction

Alternatively known as categorical data or nominal data, are widely used in real-world applications, where the attributes are represented by symbols, which are qualitative category of things [1]. A number of methods have been developed to classify symbolic data, including decision trees (DT), Naive Bayes (NB) [9] and distance-based methods such as the k-nearest neighbors (KNN) and the prototype-based classifiers [10, 11] Since both DT and NB are typically based on the assumption that symbolic attributes are conditionally independent given the class attribute, they cannot identify the non-linear correlation between attributes, which has been validated to be useful in high-quality classification [12, 13]. The non-linear Support Vector Machine (SVM) [18] makes use of Mercer kernel functions to embed raw objects into a reproducing kernel Hilbert space, such that the data can be classified in the new space with high-quality Such a method cannot be directly applied to non-linear symbolic data classification, because, essentially, it is designed for numeric data, where the Mercer kernels and some key intermediate operations, such as inner product, are well-defined. Popular solution to this problem is to transform symbolic data into numeric data as a preprocessing, using a frequency estimation-based encoding model such as the well-known One-Hot

B Lifei Chen
A sampling of classification methods for symbolic data
Data transformation methods
Kernel learning on symbolic data
Discrete kernel estimation
Bandwidth optimization
Kernel-based self-representation model
Inner product and distance measures of symbolic data
SVM-S: SVM for symbolic data
Data sets and experimental setup
Classification performance
Attribute-weight analysis
Findings
Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call