Abstract

AbstractThe standard bag of visual words model model ignores the spatial information contained in the image, but researchers have demonstrated that the object recognition performance can be improved by including spatial information. A state of the art approach is the spatial pyramid representation, which divides the image into spatial bins. In this paper, another general approach that encodes the spatial information in a much better and efficient way is described. The proposed approach is to embed the spatial information into a kernel function termed the Spatial Non-Alignment Kernel (SNAK). For each visual word, the average position and the standard deviation is computed based on all the occurrences of the visual word in the image. These are computed with respect to the center of the object, which is determined with the help of the objectness measure. The pairwise similarity of two images is then computed by taking into account the difference between the average positions and the difference between the standard deviations of each visual word in the two images. In other words, the SNAK kernel includes the spatial distribution of the visual words in the similarity of two images. Furthermore, various kernel functions can be plugged into the SNAK framework. Object recognition experiments are conducted to compare the SNAK framework with the spatial pyramid representation, and to assess the performance improvements for various state of the art kernels on two benchmark data sets. The empirical results indicate that SNAK significantly improves the object recognition performance of every evaluated kernel. Compared to the spatial pyramid, SNAK improves performance while consuming less space and time. In conclusion, SNAK can be considered a good candidate to replace the widely-used spatial pyramid representation.KeywordsKernel methodSpatial informationBag of visual words

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.