Abstract
Fisher scores have been shown to be accurate global image features for classification. However, their performance is very dependent on the quality of the input features as well as the normalization steps applied to them. In this paper, we propose to embed the Fisher scores in an end-to-end trainable deep network by concentrating on two elements: adapting the encoding to the deep features and normalizing the extracted second-order statistics. Therefore, we make use of a deep sparse coding module that allows to sample the center of each Gaussian function from a learned subspace and thus to better fit the high dimensional data distribution. Second, we introduce a new normalization module that computes an approximate square root matrix normalization well adapted to the Fisher scores. These processing steps are embedded in a deep network so that all the modules work together for the sole purpose of improving classification performance. Experimental results show that this solution outperforms many alternatives in the context of material, indoor scene and fine-grained image classification.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have