Abstract

In recent years, with the rapid development of 3D technology, view-based methods have shown excellent performance in both 3D model classification and retrieval tasks. In view-based methods, how to aggregate multi-view features is a key issue. There are two commonly used solutions in the existing methods: 1) Use pooling strategy to merge multi-view features, but it ignores the context information contained in the continuous view sequence. 2) Leverage grouping strategy or long short term memory networks (LSTM) to select representative views of the 3D model, however, it easily neglects the semantic information of individual views. In this paper, we propose a novel Semantic and Context information Fusion Network (SCFN) to compensate for these drawbacks. First, we render views from multiple perspectives of the 3D model and extract the raw feature of the individual view by 2D convolutional neural networks (CNN). Then we design the channel attention mechanism (CAM) to exploit the view-wise semantic information. By modeling the correlation among view feature channels, we can assign higher weights to useful feature attributes, while suppressing the useless. Next, we propose a context information fusion module (CFM) to fuse multiple view features to obtain a compact 3D representation. Extensive experiments are conducted on three popular datasets, i.e. , ModelNet10, ModelNet40, and ShapeNetCore55, which can demonstrate the superiority of the proposed method comparing to the state-of-the-arts on both 3D classification and retrieval tasks.

Highlights

  • In recent years, with the wide application of 3D technology in virtual reality, 3D printing, medical diagnosis, and other fields [1]–[4], the number of 3D models is proliferating, which makes the 3D model classification and retrieval tasks receive a surge of attention

  • Extensive experiments are conducted on three popular datasets, i.e., ModelNet10, ModelNet40, and ShapeNetCore55, which can demonstrate the superiority of the proposed method comparing to the state-of-the-arts on both 3D classification and retrieval tasks

  • Unlike previous view-based 3D model analysis methods, which only focus on the multi-view feature aggregation but neglect the feature representation capability of individual views, we propose a channel attention mechanism (CAM) to enhance the useful semantic information contained in individual views

Read more

Summary

Introduction

With the wide application of 3D technology in virtual reality, 3D printing, medical diagnosis, and other fields [1]–[4], the number of 3D models is proliferating, which makes the 3D model classification and retrieval tasks receive a surge of attention. The most critical step in these tasks is to learn a discriminative 3D model descriptor. The current 3D model descriptor extraction methods can be divided into two mainstreams: model-based methods and viewbased methods. View-based methods [12]–[19] usually first place virtual cameras around the 3D model to obtain multiple views, extract features of each view through 2D CNN, and fuse those view features into a compact 3D model descriptor. Since the remarkable progress of deep learning has been achieved in the 2D image recognition field [20], [21], view-based methods have been proved more successful compared to model-based methods

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call