Abstract

Current non-rigid structure from motion (NRSfM) algorithms are mainly limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in the order of magnitude. We further propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction.

Highlights

  • Building an AI capable of inferring the 3D structure and pose of an object from a single image is a problem of immense importance

  • Non-Rigid Structure from Motion (NRSf M) is not restricted to non-rigid objects; it can be applied to rigid objects whose object categories deform non-rigidly [19, 2, 30]

  • They reinterpret back-propagating through the deep neural network as learning the dictionaries {Di}ni=1. This connection offers a novel breakthrough for understanding DNNs. We extend this to the block sparse scenario and apply it to solving our NRSf M problem

Read more

Summary

Introduction

Building an AI capable of inferring the 3D structure and pose of an object from a single image is a problem of immense importance. Non-Rigid Structure from Motion (NRSf M) offers computer vision a way out of this quandary – by recovering the pose and 3D structure of an object category solely from hand annotated 2D landmarks with no need for 3D supervision. NRSf M is well-noted in literature as an ill-posed problem due to the non-rigidity This has been mainly addressed by imposing additional shape priors, e.g. low rank [6, 10], union-of-subspaces [36, 2], and block-sparsity [18, 19]. Based on this observation we propose a novel shape prior that employs hierarchical sparse coding. We propose a novel shape prior based on hierarchical sparse coding and demonstrate that the 2D projections under orthogonal cameras can be represented by the hierarchical dictionaries in a block sparse way. Both quantitative and qualitative results demonstrate our superior performance, outperforming all state-of-the-arts in the order of magnitude

Related Work
Background
Deep Non-Rigid Structure from Motion
Modeling via multi-layer sparse coding
Multi-layer block sparse coding
Block ISTA and DNNs solution
Experiments
Sf C on IKEA furniture
Methods
Large-scale NRSf M on CMU MoCap
Coherence as guide
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call