Planar Structure-from-Motion with Affine Camera Models: Closed-Form Solutions, Ambiguities and Degeneracy Analysis.

Toby Collins,Adrien Bartoli

doi:10.1109/tpami.2016.2578333

Abstract

Planar Structure-from-Motion (SfM) is the problem of reconstructing a planar object or surface from a set of 2D images using motion information. The problem is well-understood with the perspective camera model and can be solved with Homography Decomposition (HD). However when the structure is small and/or viewed far from the camera the perspective effects diminish, and in the limit the projections become affine. In these situations HD fails because the problem itself becomes ill-posed. We propose a stable alternative using affine camera models. These have been used extensively to reconstruct non-planar structures, however a general, accurate and closed-form method for planar structures has been missing. The problem is fundamentally different with planar structures because the types of affine camera models one can use are more restricted and it is inherently more ambiguous and non-linear. We provide a closed-form method for the orthographic camera model that solves the general problem (three or more views with three or more correspondences and missing correspondences) and returns all metric structure solutions and corresponding camera poses. The method does not require initialisation, and optimises an objective function that is very similar to the reprojection error. In fact there is no clear benefit in refining its solutions with bundle adjustment, which is a remarkable result. We also present a new theoretical analysis that deepens our understanding of the problem. The main result is the necessary and sufficient geometric conditions for the problem to be degenerate with the orthographic camera. We also show there can exist up to two solutions for metric structure with four or more views (previously it was assumed to be unique), and we give the necessary and sufficient geometric conditions for disambiguation. Other theoretical results include showing that in the case of three images the optimal reconstruction (with respect to reprojection error) can usually be found in closed-form, and additional prior knowledge needed to solve with non-orthographic affine cameras.

Full Text