Abstract

It is essential to build good image representations for many computer vision tasks. In this study, the authors propose a hierarchical spatial pyramid max pooling method based on scale‐invariant feature transform (SIFT) features and sparse coding, which builds image representations through a hierarchical network. It includes three parts: SIFT features’ extraction, sparse coding and spatial pyramid max pooling. To mimic visual cortex, spatial pyramid max pooling is, firstly, performed on the original SIFT features in the image patches, which distils the features and extracts the most distinctive and significant feature, the SIFT‐pooled feature, in each local patch, instead of using the original SIFT features as usual. Then, a dictionary is trained using some random SIFT‐pooled features and sparse coding is performed using the trained dictionary for all SIFT‐pooled features through K‐singular value decomposition algorithm. Finally, on the sparse codes of all image patches, spatial pyramid max pooling is carried again on the image level. The image representations will be built by concatenating the pooling features of each level. The authors use the algorithm and simple linear support vector machine (SVM) for image classification on three datasets: Caltech‐101, Caltech‐256 and 15‐Scenes and the experimental results show that the authors algorithm can reach a competitive performance compared with recently published results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call