Abstract

The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited poor performance when generalized to other categories, which means that multiple detectors or networks are needed for multi-class object image cases. In this paper, we propose a method with strong generalization ability, which incorporates only one CNN with deformable model matching processing for the 3D viewpoint and the shape estimation of multi-class object image cases. The CNN is utilized to detect keypoints of the potential object from the image, while a deformable model matching stage is designed to conduct 3D wireframe modeling and viewpoint estimation simultaneously with the support of the detected keypoints. Besides, parameter estimation by deformable model matching processing has robust fault-tolerance to the keypoint detection results containing mistaken keypoints. The proposed method is evaluated on Pascal3D+ dataset. Experiments show that the proposed method performs well in both parameter estimation accuracy and the multi-class objects generalization. This research is a useful exploration to extend the generalization of deep learning in specific tasks.

Highlights

  • Estimating the 3D geometry of an object from a single image is an important but challenging task in computer vision [1]

  • Current methods using deep learning technology can output the shape and viewpoint parameters in an end-to-end way and perform well in accuracy, most of them are limited in one specific category and exhibit poor performance when generalized to other categories

  • We evaluate the method proposed on Pascal3D+ dataset

Read more

Summary

Introduction

Estimating the 3D geometry of an object from a single image is an important but challenging task in computer vision [1]. For the 3D shape and the viewpoint estimation problem, most of the existing methods are interested in reconstructing 3D model for category-specific objects [4,5,6,7,8,9,10]. The deformable model of the specific category, such as wireframe and mesh, is matched with the image to estimate the. Sci. 2019, 9, 1975 shape and the viewpoint. Current methods using deep learning technology can output the shape and viewpoint parameters in an end-to-end way and perform well in accuracy, most of them are limited in one specific category and exhibit poor performance when generalized to other categories

Objectives
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.