Kernel principal component analysis for multimedia retrieval

Guang-Ho Cha

doi:10.18844/gjit.v6i1.384

Abstract

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval

Highlights

An explosion in the amount of digital image data has brought about the need for image retrieval systems
Motivated by the dimensionality curse, approaches to reduce the dimensionality of image feature vectors have been attempted by the use of some dimensionality reduction techniques such as principal component analysis [4, 7]
We investigate the potential of a nonlinear form of Principal component analysis (PCA) for dimensionality reduction and feature extraction in content-based image retrieval

Summary

Introduction

An explosion in the amount of digital image data has brought about the need for image retrieval systems. This demand has made image retrieval a very active re-search area in recent years. The returned images should be similar to the query image This similarity (or nearest neighbor) indexing/retrieval problem can be solved efficiently when the feature vectors have low or medium dimensionalities (e.g., less than 8) by the use of existing indexing methods such as the R*-tree [1] and the HG-tree [2]. For a high dimensionality, in theory or in practice, the performance of existing indexing methods degenerates to being worse than that of the bruteforce sequential scan that compares the query object to each data object [3]. Motivated by the dimensionality curse, approaches to reduce the dimensionality of image feature vectors have been attempted by the use of some dimensionality reduction techniques such as principal component analysis [4, 7]

Objectives

Results

Conclusion