In the last few years, there has been considerable interest in scene parsing. This task consists of assigning a predefined class label to each pixel (or pre-segmented region) in an image. To best address the complexity challenge of this task, first, we propose a new geometric retrieval strategy to select nearest neighbors from a database containing fully segmented and annotated images. Then, we introduce a novel and simple energy-minimization model. The proposed cost function of this model combines efficiently different global nonparametric semantic likelihood energy terms. These terms are computed from the (pre-)segmented regions of the (query) image and their structural properties (location, texture, color, context, and shape). Different from the traditional approaches, we use a simple and local optimization procedure derived from the iterative conditional modes algorithm to optimize our energy-based model. Experimental results on two challenging datasets: 1) microsoft research Cambridge dataset and 2) Stanford background dataset demonstrate the feasibility and the success of the proposed approach. Compared to existing annotation methods that require training classifiers for each object and learning many parameters, our method is easy to implement, has a few parameters, and combines different criteria.
Read full abstract