Fine-Grained Categorization From RGB-D Images

Yanhao Tan,Mohammad Muntasir Rahman,Ke Lu,Ling Shao,Yanfu Yan,Jian Xue

doi:10.1109/tmm.2021.3061284

Abstract

In the field of computer vision, fine-grained visual categorization has attracted a lot of attention and made great progress due to convolutional neural networks and a large number of publicly available datasets. With next-generation sensing technology, RGB-D cameras can provide high-quality synchronized RGB and depth images for solving many computer vision problems. Although RGB-D cameras have been used in the context of multi-view object category detection and scene understanding, they have not been widely used in fine-grained classification. In this paper, we introduce a multiview RGB-D dataset RGBD-FG for fine-grained categorization. Currently, the dataset contains 93 051 RGB-D images covering 19 super-categories and 50 sub-categories of common vegetables and fruit, and is organized in a hierarchical manner. We provide extensive experimental results to establish state-of-the-art benchmarks for our dataset, illustrating its diversity and scope for improvement through future work. We also propose a novel modality-specific multimodal network called FS-Multimodal network, which can solve two limitations of multimodal networks trained based on fine-tuning techniques: over-fitting and lack of effective depth-specific features. We hope that our study lays the foundations for fine-grained categorization of RGB-D data.

Full Text