Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

Liyuan Liu,Yanwei Pang,Syed Waqas Zamir,Salman Khan,Ling Shao,Fahad Shahbaz Khan

doi:10.1109/access.2019.2946031

Liyuan Liu, Yanwei Pang + Show 4 more

Open Access

https://doi.org/10.1109/access.2019.2946031

Copy DOI

Abstract

The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.

Highlights

Given an image, the goal of semantic segmentation is to assign a category label to each pixel [1], [2]
The Atrous Spatial Pyramid Pooling (ASPP) module in DeepLabv2 [8] & v3 [9] applies parallel atrous convolutions with different dilation rates to extract multiscale context information, they work on a high-level feature representation
CONTRIBUTIONS We propose and approach with the following main contributions: (a) We propose a feature pyramid based augmentation module to efficiently generate refined low-level features to preserve the local details. (b) For mid-level multiscale feature fusion, we propose a semantic refinement unit that combines a diverse set of features from the network encoder. (c) The central component of our model is a highlevel context aggregation block

Summary

INTRODUCTION

The goal of semantic segmentation is to assign a category label to each pixel [1], [2]. PSPNet [7] applies pooling operations with different sub-sampling rates, all arranged in parallel, to capture context information Their pyramid pooling module works only on last convolutional layer features, that generally lacks local scene details. The Atrous Spatial Pyramid Pooling (ASPP) module in DeepLabv2 [8] & v3 [9] applies parallel atrous convolutions with different dilation rates to extract multiscale context information, they work on a high-level feature representation. It is based on the insight that dilated convolution expands kernel size by interleaving its weights with zeros, that equates to dropping the intermediate activations in the input feature map To alleviate this problem, we propose to combine the strengths of dilated (sparse) and wider (dense) kernels, that eventually enhance the discriminative power of the network and avoids unfairly neglecting the local information, as is the case with atrous convolution. It achieves 80.5% and 44.0% mIoU scores on Cityscapes and ADE20K datasets, respectively, outperforming the best reported results in [7], [17]

RELATED WORK

EXPERIMENTS

ABLATION STUDY

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Oct 18, 2019
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
Liang-Chieh Chen ... Alan L Yuille
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40
Liang-Chieh Chen, et. al.Liang-Chieh Chen ... Alan L Yuille
27 Apr 2017
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40

A spine segmentation method based on scene aware fusion network
Elzat Elham Yilizati-Yilihamu ... Shiqing Feng
BMC Neuroscience | VOL. 24
Elzat Elham Yilizati-Yilihamu, et. al.Elzat Elham Yilizati-Yilihamu ... Shiqing Feng
14 Sep 2023
BMC Neuroscience | VOL. 24

A multi-level feature fusion method based on pooling and similarity for HRRS image retrieval
Yun Ge ... Famao Ye
Remote Sensing Letters | VOL. 12
Yun Ge, et. al.Yun Ge ... Famao Ye
24 Aug 2021
Remote Sensing Letters | VOL. 12

Tree-Structured Kronecker Convolutional Network for Semantic Segmentation
Tianyi Wu ... Sheng Tang
-
Tianyi Wu, et. al.Tianyi Wu ... Sheng Tang
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access