Abstract

Recently, owing to the requirements of inference speed, most real-time semantic segmentation networks often have shallow network depth, which limits the receptive field size of the model, leading to the limited acquisition of semantic information and resulting in intraclass inconsistency and ultimately a decrease in segmentation accuracy. Additionally, the shallow network depth also restricts the feature extraction capability of the network, reducing its robustness and ability to adapt to complex scenes. To address these issues, a bilateral network with a rich semantic extractor (RSE) for real-time semantic segmentation (BRSeNet) is presented to perform real-time semantic segmentation. First, to solve the problem of insufficient semantic feature information extraction, an RSE is proposed, which includes a multiscale global semantic extraction module (MGSEM) and a semantic fusion module (SFM). The MGSEM can extract rich global semantics and expand the effective receptive field. Simultaneously, the SFM efficiently integrates multiscale local semantics with multiscale global semantics, resulting in more comprehensive semantic information for the network. Finally, based on the characteristics of detail and semantic branches, a bilateral reconstruction aggregation module is designed to reconstruct the contextual information of detail features, model the interdependencies on semantic feature channels, and enhance feature representation. Comprehensive experiments on the challenging Cityscapes and ADE20K datasets are conducted. The experimental results show that the proposed BRSeNet achieves mean intersection over union of 74.9% and 35.7% at inference speeds of 74 and 65 frames per second, respectively, and ensures a favorable balance between segmentation accuracy and inference speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call