Scene Segmentation with Low-Dimensional Semantic Representations and Conditional Random Fields

Wen Yang,Gui-Song Xia,Dengxin Dai,Bill Triggs

doi:10.1155/2010/196036

Abstract

This paper presents a fast, precise, and highly scalable semantic segmentation algorithm that incorporates several kinds of local appearance features, example-based spatial layout priors, and neighborhood-level and global contextual information. The method works at the level of image patches. In the first stage, codebook-based local appearance features are regularized and reduced in dimension using latent topic models, combined with spatial pyramid matching based spatial layout features, and fed into logistic regression classifiers to produce an initial patch level labeling. In the second stage, these labels are combined with patch-neighborhood and global aggregate features using either a second layer of Logistic Regression or a Conditional Random Field. Finally, the patch-level results are refined to pixel level using MRF or over-segmentation based methods. The CRF is trained using a fast Maximum Margin approach. Comparative experiments on four multi-class segmentation datasets show that each of the above elements improves the results, leading to a scalable algorithm that is both faster and more accurate than existing patch-level approaches.

Highlights

Semantic scene segmentation—object-level scene labeling—is playing an increasingly important role in the fields of low, mid, and high-level computer vision
Relative to the stage 1 Logistic Regression Classifier (LRC) classifier, including the stage 2 Conditional Random Field (CRF) improves the results by about 3% for nearest-patch labeling and 4% for oversegmentation labeling
Our LRC/CRF classifier improves the state of the art [21] by 0.1%

Summary

Introduction

Semantic scene segmentation—object-level scene labeling—is playing an increasingly important role in the fields of low-, mid-, and high-level computer vision. Semantic segmentation remains challenging due to the "aperture problem" of local ambiguity. Various forms of contextual information have been introduced to reduce this ambiguity, notably random fields that enhance the local coherence of regions and transitions, topic models that enhance the image-wide relevance of the labels used, and spatial priors that encode the expected absolute or relative image positions of the various labels. Labeling algorithms worked with individual pixels, but recent efforts often achieve higher efficiency and consistency by working with patches or superpixels (small groups of similar pixels). We use a regular patch-based representation for ease of image description and of inference within our random field framework

Methods

Results

Conclusion