Abstract

In computer vision, Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) are two widely used local descriptors. In this paper, we propose to combine them effectively for scene categorization. First, LBP and SIFT features are regularly extracted from training images for constructing a LBP feature codebook and a SIFT feature codebook. Then, a two-dimensional table is created by combining the obtained codebooks. For creating a representation for an image, LBP and SIFT features extracted from the same positions of the image are encoded together based on sparse coding by using the two-dimensional table. After processing all features in the input image, we adopt spatial max pooling to determine its representation. Obtained image representations are forwarded to a Support Vector Machine classifier for categorization. In addition, in order to improve the scene categorization performance further, we propose a method to select correlated visual words from large codebooks for constructing the two-dimensional table. Finally, for evaluating the proposed method, extensive experiments are implemented on datasets Scene Categories 8, Scene Categories 15 and MIT 67 Indoor Scene. It is demonstrated that the proposed method is effective for scene categorization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call