Abstract

Recognition of 3D shapes is a fundamental task in computer vision. In recent years, view-based deep learning has emerged as an effective approach for 3D shape recognition. Most existing view-based methods treat the views of an object as an unordered set, which ignores the dynamic relations among the views, e.g. sequential semantic dependencies. In this paper, modeling the views of an object by a sequence, we aim at exploiting the long-term dependencies among different views for shape recognition, which is done by constructing a sequence-aware view aggregation module based on the bi-directional Long Short-Term Memory network. It is shown that our view aggregation module not only captures the bi-directional dependencies in view sequences, but also enjoys the robustness to circular shifts of input sequences. Incorporating the aggregation module into a standard convolutional network architecture, we develop an effective method for 3D shape classification and retrieval. Our method was evaluated on the ModelNet40/10 and ShapeNetCore55 datasets. The results show the encouraging performance gain from exploiting long-term dependencies in view sequences, as well as the superior performance of our method compared to the existing ones.

Highlights

  • Understanding 3D objects has been a fundamental problem since the establishment of computer vision, with a broad spectrum of applications including multimedia [1], augmented reality [2], [3], entertainment [4], robotics [5], [6], autonomous driving [7]–[10], 3D reverse engineering [11], [12], medical imaging [13], [14], and monitoring [15]

  • Inspired by the great success of deep learning in image classification [16], [17], many approaches (e.g. [18]–[22]) to 3D shape recognition have been proposed based on neural networks (NNs)

  • To exploit the long-term dependencies of view sequences for 3D shape recognition, we propose a sequence-aware view aggregation module based on the long short-term memory (LSTM) [26]

Read more

Summary

INTRODUCTION

Understanding 3D objects has been a fundamental problem since the establishment of computer vision, with a broad spectrum of applications including multimedia [1], augmented reality [2], [3], entertainment [4], robotics [5], [6], autonomous driving [7]–[10], 3D reverse engineering [11], [12], medical imaging [13], [14], and monitoring [15]. To exploit the long-term dependencies of view sequences for 3D shape recognition, we propose a sequence-aware view aggregation module based on the long short-term memory (LSTM) [26]. Building the proposed view aggregation module into a standard view-based CNN, we develop an effective method for 3D shape recognition. We propose to treat the views of an object as a sequence and investigate the exploitation of the bi-directional long-term dependencies of view sequences for 3D shape recognition. We propose an effective view-based CNN with a bi-directional LSTM-based aggregation module for 3D shape classification and retrieval. The proposed network has the advantages of analyzing the long-term semantic dependencies of view sequences along two directions, recognizing complex shapes, and achieving robustness to circular shifts of view sequence.

RELATED WORK
CONVOLUTIONAL VIEW FEATURE EXTRACTION
SEQUENCE-BASED VIEW AGGREGATION
EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call