A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders

Tanmay Singha,Duc-Son Pham,Aneesh Krishna

doi:10.1016/j.patcog.2023.109557

Abstract

Recent studies show a significant growth in semantic segmentation. However, many semantic segmentation models still have a large number of parameters, making them unsuitable for resource-constrained embedded devices. To address this issue, we propose an efficient Shared Feature Reuse Segmentation (SFRSeg) model containing several novelties: a new yet effective shared-branch multiple sub-encoders design, a context mining module and a semantic aggregating module for better context granularity. In particular, our shared-branch approach improves the entire feature hierarchy by sharing the spatial and context knowledge in both shallow and deep branches. After every shared point in each sub-encoder, a proposed cascading context mining (CCM) module is deployed to filter out the noisy spatial details from the feature maps and provides a diverse size of receptive fields for capturing the latent context between multi-scale geometric shapes in the scene. To overcome the gradient vanishing issue at the early stage, we reduce the number of layers in the first sub-encoder and employ a unique multiple sub-encoders design which reprocesses the rich global feature maps through multiple sub-encoders for better feature refinement. Later, the rich semantic features generated by the efficient sub-encoders at different levels are fused by the proposed Hybrid Path Attention Semantic Aggregation (HPA-SA) module that effectively reduces the semantic gap between feature maps at different levels and alleviate the well-known boundary degeneration effect. To make it computationally efficient for resource-constrained embedded devices, a series of lightweight methods such as a lightweight encoder, a squeeze-and-excitation design, separable convolution filters, channel reduction (CR) are carefully exploited. With an exceptional performance on Cityscapes (70.6% test mIoU) and CamVid (74.7% test mIoU) data sets, the proposed model is shown to be superior over existing light real-time semantic segmentation models whilst having only 1.6 million parameters.

Full Text