Aggregation of Masked Outputs for Improving Accuracy–Cost Trade-Off in Semantic Segmentation

Min-Kook Suh,Seung-Woo Seo

doi:10.1109/access.2023.3265077

Abstract

Downsampling layers are essential for convolutional neural network-based semantic segmentation methods to widen their receptive fields. However, as fine-grained information is lost in the layers, the accuracy of these methods becomes limited. The need for downsampling layers can be eliminated by using a transformer encoder. Nevertheless, removing downsampling layers inevitably increases the computational cost of the network. In this paper, we present a mask transformer layer that reduces computational cost in any transformer-based networks by substituting a vanilla transformer layer. Additionally, we introduce an aggregation scheme to merge masked outputs, which enhances the accuracy of predictions. Our method aggregates intermediate outputs to generate a final output where the number of intermediate outputs depends on the importance of an area. With this strategy, we achieve different computational cost levels by modulating the threshold used to determine the importance. Our method comprises the following steps. First, we split the transformer encoder into several blocks and attach a segmentation decoder to each block to estimate the intermediate segmentation output. On the basis of the intermediate outputs and predefined thresholds, we classify unnecessary image patches and remove them in subsequent blocks. By progressively masking unnecessary patches, we obtain multiple intermediate outputs for important areas; aggregating them yields better segmentation accuracy with a lower computational burden. In addition, we determine the most effective training scheme and devise a threshold-search algorithm to optimally determine threshold hyperparameters. Extensive experiments on the ADE20K, Cityscapes, and Pascal-Context datasets verify the efficacy of our design, which surpasses the accuracy of the baseline method with lower computational cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 1	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Aggregation of Masked Outputs for Improving Accuracy–Cost Trade-Off in Semantic Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

PL-Net: progressive learning network for medical image segmentation.
Kunpeng Mao ... Zekui Liu
Frontiers in bioengineering and biotechnology | VOL. 12
Kunpeng Mao, et. al.Kunpeng Mao ... Zekui Liu
01 Jan 2024
Frontiers in bioengineering and biotechnology | VOL. 12

Study of convolutional neural network-based semantic segmentation methods on edge intelligence devices for field agricultural robot navigation line extraction
Jiya Yu ... Yanchao Zhang
Computers and Electronics in Agriculture | VOL. 209
Jiya Yu, et. al.Jiya Yu ... Yanchao Zhang
05 Apr 2023
Computers and Electronics in Agriculture | VOL. 209

Y-Net: Dual-branch Joint Network for Semantic Segmentation
Yizhen Chen ... Haifeng Hu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17
Yizhen Chen, et. al.Yizhen Chen ... Haifeng Hu
12 Nov 2021
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 17

Disentangled convolution for optimizing receptive field
Takumi Kobayashi
Pattern Recognition Letters | VOL. 169
Takumi KobayashiTakumi Kobayashi
02 Apr 2023
Pattern Recognition Letters | VOL. 169

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Aggregation of Masked Outputs for Improving Accuracy–Cost Trade-Off in Semantic Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE Access