Abstract

Semantic image segmentation, as one of the most popular tasks in computer vision, has been widely used in autonomous driving, robotics and other fields. Currently, deep convolutional neural networks (DCNNs) are driving major advances in semantic segmentation due to their powerful feature representation. However, DCNNs extract high-level feature representations by strided convolution, which makes it impossible to segment foreground objects precisely, especially when locating object boundaries. This paper presents a novel semantic segmentation algorithm with DeepLab v3+ and super-pixel segmentation algorithm-quick shift. DeepLab v3+ is employed to generate a class-indexed score map for the input image. Quick shift is applied to segment the input image into superpixels. Outputs of them are then fed into a class voting module to refine the semantic segmentation results. Extensive experiments on proposed semantic image segmentation are performed over PASCAL VOC 2012 dataset, and results that the proposed method can provide a more efficient solution.

Highlights

  • Semantic image segmentation is a typical computer vision problem

  • Most deep convolutional neural networks (DCNNs) for semantic segmentation are based on a common pioneer: fully convolutional network (FCN) proposed by Long et al [5]

  • To ttackle the challenging problem, this paper presents a novel method to refine the object boundaries of the segmentation results output from DeepLab v3+, which unites the benefits of DeepLab v3+ with the superpixel segmentation algorithm-quick shift [19]

Read more

Summary

Introduction

Semantic image segmentation is a typical computer vision problem. Its task is to assign different categories to each pixel in an image according to the object of interest [1]. FCN is considered a milestone in deep learning techniques for semantic segmentation, since it demonstrates how DCNNs can be trained end-to-end to solve this problem, efficiently learning how to produce dense pixel-level predictions for input of arbitrary sizes. DeepLab v2 combines atrous convolution, ASPP and fully connected CRF, achieving 79.9% mIOU accuracy in the PASCAL VOC 2012 dataset. One of the main problems is that it adopts DCNN for semantic segmentation, which consists of strided pooling and convolution layers They increase receptive field but aggregate the context information while discarding the boundary information.

Methodology
Motivation
Main Process
Architecture
Training Details
Quick Shift
Class Voting
Experimental Design
Qualitative Evaluation
Quantitative Evaluation
Why Quick Shift Superior to SLIC and Felzenszwalb
Findings
The Influence of Parameter σ on Segmentation Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call