Road scene segmentation is the basic task of autonomous driving. Recent representative scene segmentation methods adopt the full convolutional network based on the encoder-decoder. However, the framework can cause the loss of image fine-grained information in the process of down-sampling, feature extraction and feature fusion, resulting in blurred boundary details and chaotic segmentation effect. In this work, a road scene segmentation network based on multi-scale feature fusion and context information aggregation is proposed, in which context information is used to guide feature fusion and enhance semantic feature extraction. Three plug-and-play modules are designed to extract multi-scale features with strong semantic information from high-level features, which compensate for the loss of spatial information in the upper sampling stage, and capture the information dependence among pixels to improve pixel-by-pixel segmentation. Experimental results on Camvid and Cityscapes show that the proposed multi-scale feature fusion and context information aggregation network (MFCANet) can achieve satisfactory performance compared with the state-of-the-art segmentation methods.
Read full abstract