Abstract
Recent development on image categorization, especially scene categorization, shows that the combination of standard visible RGB image data and near-infrared (NIR) image data performs better than RGB-only image data. However, the size of RGB-NIR image collection is often limited due to the difficulty of acquisition. With limited data, it is difficult to extract effective features using the common deep learning networks. It is observed that humans are able to learn prior knowledge from other tasks or a good mentor, which is helpful to solve the learning problems with limited training samples. Inspired by this observation, we propose a novel training methodology for introducing the prior knowledge into a deep architecture, which allows us to bypass the burdensome labeling large quantity of image data to meet the big data requirements in deep learning. At first, transfer learning is adopted to learn single modal features from a large source database, such as ImageNet. Then, a knowledge distillation method is explored to fuse the RGB and NIR features. Finally, a global optimization method is employed to fine-tune the entire network. The experimental results on two RGB-NIR datasets demonstrate the effectiveness of our proposed approach in comparison with the state-of-the-art multi-modal image categorization methods.
Highlights
In the past several decades, numerous computer vision methods have been developed to process visible RGB images
Recent studies demonstrate that if we have a larger spectrum of radiation than RGB-only images, the better performance would be obtained in many computational visual tasks, such as saliency detection [1, 2], scene categorization [3, 4], and image segmentation [5, 6]
In order to facilitate such easier tasks for RGB-NIR image classification, we proposed a feature fusion method based on knowledge distillation (FFKD) in this paper
Summary
In the past several decades, numerous computer vision methods have been developed to process visible RGB images. Recent studies demonstrate that if we have a larger spectrum of radiation than RGB-only images, the better performance would be obtained in many computational visual tasks, such as saliency detection [1, 2], scene categorization [3, 4], and image segmentation [5, 6]. RGB-NIR image categorization is one of the most challenging tasks for two reasons. Some “shallow” features used in traditional methods, e.g., Scale-Invariant Feature Transform (SIFT) [7], Gist [8], and census transform histogram (CENTRIST) [9], may do not need any labeled data. Because in terms of the biology of color vision, these features are corresponding to the lowest level of processes in the hierarchically
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.