The problem considered here involves the use of a sequence of monocular images of a three-dimensional moving object to estimate both its structure and kinematics. The object is assumed to be rigid, and its motion is assumed to be smooth. A set of object match points is assumed to be available, consisting of fixed features on the object, the image plane coordinates of which have been extracted from successive images in the sequence. The measured data are the noisy image plane coordinates of this set of object match points, taken from each image in the sequence. In previous papers [ IEEE Trans. Pattern Anal. Mach. Intell.PAMI-8, 90 ( 1986); in Proceedings of the IEEE Workshop on Motion: Representation and Analysis ( Institute of Electrical and Electronics Engineers, New York, 1986), p. 95; in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ( Institute of Electrical and Electronics Engineers, New York, 1986), p. 176] we discussed model-based approaches for motion and structure estimation from a long sequence of images. We examine here the performance of such techniques by using a Cramer–Rao lower bounds on the estimation error variance. This method permits a priori prediction of estimation accuracy as a function of a number of factors, including the number of images in the sequence, the time at which each image is made, the number of feature points used, the image-plane noise level, and the type of motion that is involved. Theoretical performance predictions are compared with the statistics of Monte Carlo simulation, and it is shown that the actual estimation accuracy is close to the Cramer–Rao bounds in most cases. These results also show that noisy sequences with fewer than four images often do not contain enough information to permit accurate estimation of motion and structure parameters. This conclusion is consistent with the observed instability of so-called two-frame estimation methods in the presence of noise.