Abstract

This paper focuses on the problem of how data representation influences the generalization error of kernel-based learning machines like support vector machines (SVMs). We analyse the effects of sparse and dense data representations on the generalization error of SVM. We show that using sparse representations the performances of classifiers belonging to hypothesis spaces induced by polynomial or Gaussian kernel functions reduce to the performances of linear classifiers. Sparse representations reduce the generalization error as long as the representation is not too sparse as with very large dictionaries. Dense data representations reduce the generalization error also using very large dictionaries. We use two schemes for representing data in data-independent overcomplete Haar and Gabor dictionaries, and measure the generalization error of SVMs on benchmark datasets. We study sparse and dense representations in the case of data-dependent overcomplete dictionaries and we show how this leads to principal component analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call