Abstract

In the field of computer vision, fine-grained visual categorization has attracted a lot of attention and made great progress due to convolutional neural networks and a large number of publicly available datasets. With next-generation sensing technology, RGB-D cameras can provide high-quality synchronized RGB and depth images for solving many computer vision problems. Although RGB-D cameras have been used in the context of multi-view object category detection and scene understanding, they have not been widely used in fine-grained classification. In this paper, we introduce a multiview RGB-D dataset RGBD-FG for fine-grained categorization. Currently, the dataset contains 93 051 RGB-D images covering 19 super-categories and 50 sub-categories of common vegetables and fruit, and is organized in a hierarchical manner. We provide extensive experimental results to establish state-of-the-art benchmarks for our dataset, illustrating its diversity and scope for improvement through future work. We also propose a novel modality-specific multimodal network called FS-Multimodal network, which can solve two limitations of multimodal networks trained based on fine-tuning techniques: over-fitting and lack of effective depth-specific features. We hope that our study lays the foundations for fine-grained categorization of RGB-D data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call