We present a fast, robust algorithm for multi-frame structure from motion from point features which works for general motion and large perspective effects. The algorithm is for point features but easily extends to a direct method based on image intensities. Experiments on synthetic and real sequences show that the algorithm gives results nearly as accurate as the maximum likelihood estimate in a couple of seconds on an IRIS 10000. The results are significantly better than those of an optimal two-image estimate. When the camera projection is close to scaled orthographic, the accuracy is comparable to that of the Tomasi/Kanade algorithm, and the algorithms are comparably fast. The algorithm incorporates a quantitative theoretical analysis of the bas-relief ambiguity and exemplifies how such an analysis can be exploited to improve reconstruction. Also, we demonstrate a structure-from-motion algorithm for partially calibrated cameras, with unknown focal length varying from image to image. Unlike the projective approach, this algorithm fully exploits the partial knowledge of the calibration. It is given by a simple modification of our algorithm for calibrated sequences and is insensitive to errors in calibrating the camera center. Theoretically, we show that unknown focal-length variations strengthen the effects of the bas-relief ambiguity. This paper includes extensive experimental studies of two-frame reconstruction and the Tomasi/Kanade approach in comparison to our algorithm. We find that two-frame algorithms are surprisingly robust and accurate, despite some problems with local minima. We demonstrate experimentally that a nearly optimal two-frame reconstruction can be computed quickly, by a minimization in the motion parameters alone. Lastly, we show that a well known problem with the Tomasi/Kanade algorithm is often not a significant one.
Read full abstract