Abstract

Recognition of 3D shapes is a fundamental task in computer vision. In recent years, view-based deep learning has emerged as an effective approach for 3D shape recognition. Most existing view-based methods treat the views of an object as an unordered set, which ignores the dynamic relations among the views, e.g. sequential semantic dependencies. In this paper, modeling the views of an object by a sequence, we aim at exploiting the long-term dependencies among different views for shape recognition, which is done by constructing a sequence-aware view aggregation module based on the bi-directional Long Short-Term Memory network. It is shown that our view aggregation module not only captures the bi-directional dependencies in view sequences, but also enjoys the robustness to circular shifts of input sequences. Incorporating the aggregation module into a standard convolutional network architecture, we develop an effective method for 3D shape classification and retrieval. Our method was evaluated on the ModelNet40/10 and ShapeNetCore55 datasets. The results show the encouraging performance gain from exploiting long-term dependencies in view sequences, as well as the superior performance of our method compared to the existing ones.

Highlights

  • Understanding 3D objects has been a fundamental problem since the establishment of computer vision, with a broad spectrum of applications including multimedia [1], augmented reality [2], [3], entertainment [4], robotics [5], [6], autonomous driving [7]–[10], 3D reverse engineering [11], [12], medical imaging [13], [14], and monitoring [15]

  • Inspired by the great success of deep learning in image classification [16], [17], many approaches (e.g. [18]–[22]) to 3D shape recognition have been proposed based on neural networks (NNs)

  • To exploit the long-term dependencies of view sequences for 3D shape recognition, we propose a sequence-aware view aggregation module based on the long short-term memory (LSTM) [26]

Read more

Summary

INTRODUCTION

Understanding 3D objects has been a fundamental problem since the establishment of computer vision, with a broad spectrum of applications including multimedia [1], augmented reality [2], [3], entertainment [4], robotics [5], [6], autonomous driving [7]–[10], 3D reverse engineering [11], [12], medical imaging [13], [14], and monitoring [15]. To exploit the long-term dependencies of view sequences for 3D shape recognition, we propose a sequence-aware view aggregation module based on the long short-term memory (LSTM) [26]. Building the proposed view aggregation module into a standard view-based CNN, we develop an effective method for 3D shape recognition. We propose to treat the views of an object as a sequence and investigate the exploitation of the bi-directional long-term dependencies of view sequences for 3D shape recognition. We propose an effective view-based CNN with a bi-directional LSTM-based aggregation module for 3D shape classification and retrieval. The proposed network has the advantages of analyzing the long-term semantic dependencies of view sequences along two directions, recognizing complex shapes, and achieving robustness to circular shifts of view sequence.

RELATED WORK
CONVOLUTIONAL VIEW FEATURE EXTRACTION
SEQUENCE-BASED VIEW AGGREGATION
EXPERIMENTS
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.