Abstract

Cross-media 3D model recognition is an important and challenging task in computer vision, which can be utilized in many applications such as landmark detection, image set classification, etc. In recent years, with the development of deep learning, many approaches have been proposed to handle the 3D model recognition problem. However, all of these methods focus on the structure information representation and the multi-view information fusion, and ignore the spatial and temporal information. So that it is not suitable for the cross-media 3D model recognition. In this paper, we utilize the sequence views to represent each 3D model and propose a novel Multi-view Convolutional LSTM Network (MVCLN), which utilizes the LSTM structure to extract temporal information and applies the convolutional operation to extract spatial information. More especially, the spatial and temporal information both are considered during the training process, which can effectively utilize the differences between the view's spatial information to improve the final performance. Meanwhile, we also introduce the classic attention model to define the weight of each view, which can reduce the redundant information of view's spatial information in the information fusion step. We evaluate the proposed method on the ModelNet40 for 3D model classification and retrieval task. We also construct a dataset utilizing the overlap categories of MV-RED, ShapenetCore and ModelNet to demonstrate the effectiveness of our approach for the cross-media 3D model recognition. Experimental results and comparisons with the state-of-the-art methods demonstrate that our framework can achieve superior performance.

Highlights

  • With the advanced development of digital techniques and computer vision [1]–[4], 3D models are widely used in our daily life, such as computer-aided design, medical diagnoses, bioinformatics, 3D printing, medical imaging, and digital entertainment [5]–[7]

  • We construct a dataset utilize the overlap categories of MV-RED, ShapenetCore and ModelNet to demonstrate the effectiveness of our approach for the cross-media 3D model recognition

  • Cross-media 3D model recognition is an important and challenging task in computer vision, which can be utilized in many applications such as landmark detection, image set classification, etc

Read more

Summary

INTRODUCTION

With the advanced development of digital techniques and computer vision [1]–[4], 3D models are widely used in our daily life, such as computer-aided design, medical diagnoses, bioinformatics, 3D printing, medical imaging, and digital entertainment [5]–[7]. Many approaches only utilize the pre-train CNN model to extract feature vector of each view and focus on the information fusion. In order to handle these two problems, in this paper, we propose a novel Multi-view Convolutional LSTM Network (MVCLN), which utilizes the convolutional LSTM to extract the temporal information and saves the spatial information in the process of training. We construct the Bi-directional LSTM based on the structure of utilizes the convolutional LSTM, which can find the difference of spatial information among the rendered views to improve the performance of final descriptor. We proposed a novel Multi-view Convolutional LSTM Network (MVCLN), which utilizes the convoluation LSTM to extract the temporal information and saves the spatial information in the process of training.

RELATED WORKS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call