Abstract

AbstractEfficient and accurate semantic segmentation is crucial for autonomous driving scene parsing. Capturing detailed information and semantic information efficiently through two‐branch networks has been widely utilised in real‐time semantic segmentation. This study proposes a network named MRFNet based on two‐branch strategy to solve the problem of accuracy and speed of segmentation in urban scenes. Many real‐time networks do not comprehensively consider contextual information from sub‐regions in different directions and at different scales. To handle this problem, a Multi‐directional Feature Refinement Module (MFRM) which has three sub‐paths to capture information at different scales and directions is proposed. And MFRM reduces computation by using strip pooling and dilated convolution operations. In particular, the authors propose a Feature Cross‐guide Aggregation Module to aggregate detailed information and contextual information through the mutual guidance of detailed information and semantic information. This module guides the extraction of feature maps in a more precise direction. Experiments on Cityscapes and CamVid datasets demonstrate the effectiveness of our method by achieving a balance between accuracy and inference speed. Specially, on single 1080Ti GPU, our method yields 78.9% mean intersection over union (mIoU) and 77.4% mIoU at speed of 144.5 frames per second (FPS) and 120.8 FPS on Cityscapes and CamVid datasets respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call