We present a method for learning a set of generative models which are suitable for representing selected image-domain features of a scene as a function of changes in the camera viewpoint. Such models are important for robotic tasks, such as probabilistic position estimation (i.e. localization), as well as visualization. Our approach entails the automatic selection of the features, as well as the synthesis of models of their visual behavior. The model we propose is capable of generating maximum-likelihood views, as well as a measure of the likelihood of a particular view from a particular camera position. Training the models involves regularizing observations of the features from known camera locations. The uncertainty of the model is evaluated using cross validation, which allows for a priori evaluation of features and their attributes. The features themselves are initially selected as salient points by a measure of visual attention, and are tracked across multiple views. While the motivation for this work is for robot localization, the results have implications for image interpolation, image-based scene reconstruction and object recognition. This paper presents a formulation of the problem and illustrative experimental results.