Abstract

The semantic segmentation (SS) task aims to create a dense classification by labeling at the pixel level each object present on images. Convolutional neural network (CNN) approaches have been widely used, and exhibited the best results in this task. However, the loss of spatial precision on the results is a main drawback that has not been solved. In this work, we propose to use a multi-task approach by complementing the semantic segmentation task with edge detection, semantic contour, and distance transform tasks. We propose that by sharing a common latent space, the complementary tasks can produce more robust representations that can enhance the semantic labels. We explore the influence of contour-based tasks on latent space, as well as their impact on the final results of SS. We demonstrate the effectiveness of learning in a multi-task setting for hourglass models in the Cityscapes, CamVid, and Freiburg Forest datasets by improving the state-of-the-art without any refinement post-processing.

Highlights

  • Humans possess a remarkable ability to parse images by looking at them

  • EXPERIMENTS DESCRIPTION We describe a set of empirical studies in order to show how the addition or removal of contour-based auxiliary tasks helps improve the semantic segmentation task

  • In this paper, we incorporated auxiliary contour-based tasks to address the loss of spatial precision

Read more

Summary

Introduction

In a blink of an eye, a human can fully analyze an image and separate all its components. People can perform several tasks simultaneously by analyzing an image, e.g., object detection and contour detection. Humans enjoy an inherent capacity for generalization, they lack the processing power given by computers. The separation of an image into its components (i.e., join pixels into regions) according to some features is called image segmentation [1]. Reproducing this process at or above the human level on a computer is not an easy task, and several approaches have been proposed to address it [2]. The segmentation task continues to be challenging mainly due to variability, i.e., when the visual tasks are performed on a computer there is a considerable variation in pose, appearance, viewpoint, illumination, and occlusion throughout different instances of

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call