Symbolic reasoning among 3-D models and 2-D images

Rodney A Brooks

doi:10.1016/0004-3702(81)90028-x

Abstract

We describe model-based vision systems in terms of four components: models, prediction of image features, description of image features, and interpretation which relates image features to models. We describe details of modelling, prediction and interpretation in an implemented model-based vision system. Both generic object classes and specific objects are represented by volume models which are independent of viewpoint. We model complex real world object classes. Variations of size, structure and spatial relations within object classes can be modelled. New spatial reasoning techniques are described which are useful both for prediction within a vision system, and for planning within a manipulation system. We introduce new approaches to prediction and interpretation based on the propagation of symbolic constraints. Predictions are two pronged. First, prediction graphs provide a coarse filter for hypothesizing matches of objects to image feature. Second, they contain instructions on how to use measurements of image features to deduce three dimensional information about tentative object interpretations. Interpretation proceeds by merging local hypothesized matches, subject to consistent derived implications about the size, structure and spatial configuration of the hypothesized objects. Prediction, description and interpretation proceed concurrently from coarse object subpart and class interpretations of images, to fine distinctions among object subclasses and more precise three dimensional quantification of objects. We distinguish our implementations from the fundamental geometric operations required by our general image understanding scheme. We suggest directions for future research for improved algorithms and representations.

Full Text