Abstract
Kernel-based methods have become popular in machine learning; however, they are typically designed for numeric data. These methods are established in vector spaces, which are undefined for categorical data. In this paper, we propose a new kind of kernel trick, showing that mapping of categorical samples into kernel spaces can be alternatively described as assigning a kernel-based weight to each categorical attribute of the input space, so that common distance measures can be employed. A data-driven approach is then proposed to kernel bandwidth selection by optimizing feature weights. We also make use of the kernel-based distance measure to effectively extend nearest-neighbor classification to classify categorical data. Experimental results on real-world data sets show the outstanding performance of this approach compared to that obtained in the original input space.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have