Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes.

Yong Liao,Qiong Liu

doi:10.3390/s21093270

Abstract

The main challenges of semantic segmentation in vehicle-mounted scenes are object scale variation and trading off model accuracy and efficiency. Lightweight backbone networks for semantic segmentation usually extract single-scale features layer-by-layer only by using a fixed receptive field. Most modern real-time semantic segmentation networks heavily compromise spatial details when encoding semantics, and sacrifice accuracy for speed. Many improving strategies adopt dilated convolution and add a sub-network, in which either intensive computation or redundant parameters are brought. We propose a multi-level and multi-scale feature aggregation network (MMFANet). A spatial pyramid module is designed by cascading dilated convolutions with different receptive fields to extract multi-scale features layer-by-layer. Subseqently, a lightweight backbone network is built by reducing the feature channel capacity of the module. To improve the accuracy of our network, we design two additional modules to separately capture spatial details and high-level semantics from the backbone network without significantly increasing the computation cost. Comprehensive experimental results show that our model achieves 79.3% MIoU on the Cityscapes test dataset at a speed of 58.5 FPS, and it is more accurate than SwiftNet (75.5% MIoU). Furthermore, the number of parameters of our model is at least 53.38% less than that of other state-of-the-art models.

Highlights

Semantic segmentation is a basic computer vision topic, wherein an explicit category label is assigned to each pixel of an input image, which can be utilized in many applications, such as automotive driving, medical imaging, and video surveillance [1,2,3,4]
We propose a multi-layer and multi-scale feature aggregation network (MMFANet) for semantic segmentation in vehicle-mounted scenes
We introduce a multi-layer and multi-scale feature aggregation network (MMFANet), which consists of four key components: a modified ResNet-18 [15] that is based on a cascade dilated convolution module (CDCM), a spatial detail module (SDM), a context aggregation module (CAM), and a decoder

Summary

Introduction

Convolutional neural networks (CNNs) are inherently limited by the design of the neurons at each layer, where the receptive field is restricted to constant regions, and the representation ability of multi-scale features is limited. The pioneering work of multi-scale feature representation involves constructing an image pyramid (Figure 1a), where small object details and long-range context can be obtained from large-scale and small-scale inputs, respectively [7,9,10]. This process takes a lot of time. Because there are different neuron receptive fields in different layers, the features extracted from various layers of an encoder implicitly contain different scale information. A more efficient method builds a spatial pyramid feature extraction module (Figure 1c), such as a pyramid pooling module [8] or atrous spatial pyramid pooling (ASPP)

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: May 9, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

MVPNet: A multi-scale voxel-point adaptive fusion network for point cloud semantic segmentation in urban scenes
Huchen Li ... Jonathan Li
International Journal of Applied Earth Observation and Geoinformation | VOL. 122
Huchen Li, et. al.Huchen Li ... Jonathan Li
19 Jun 2023
International Journal of Applied Earth Observation and Geoinformation | VOL. 122

Multi-Scale Convolutional Features Network for Semantic Segmentation in Indoor Scenes
Yanran Wang ... Shilang Chen
IEEE Access | VOL. 8
Yanran Wang, et. al.Yanran Wang ... Shilang Chen
01 Jan 2020
IEEE Access | VOL. 8

AMCFNet: Asymmetric multiscale and crossmodal fusion network for RGB-D semantic segmentation in indoor service robots
Wujie Zhou ... Lu Yu
Journal of Visual Communication and Image Representation | VOL. 97
Wujie Zhou, et. al.Wujie Zhou ... Lu Yu
16 Oct 2023
Journal of Visual Communication and Image Representation | VOL. 97

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images
Ronghua Shang ... Jiyu Zhang
Remote Sensing | VOL. 12
Ronghua Shang, et. al.Ronghua Shang ... Jiyu Zhang
09 Mar 2020
Remote Sensing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)