AbstractDeep images can provide rich spatial structure information, which can effectively exclude the interference of illumination and road texture in road scene segmentation and make better use of the prior knowledge of road area. This paper first proposes a new cross‐modal feature maintenance and encouragement network. It includes a quantization statistics module as well as a maintenance and encouragement module for effective fusion between multimodal data. Meanwhile, for the problem that if the road segmentation is performed directly using a segmentation network, there will be a lack of supervised guidance with clear physical meaningful information and poor interpretability of learning features, this paper proposes two road segmentation models based on prior knowledge of deep image: disparity information and surface normal vector information. Then, a two‐branch neural network is used to process the colour image and the processed depth image separately, to achieve the full utilization of the complementary features of the two modalities. The experimental results on the KITTI road dataset and Cityscapes dataset show that the method in this paper has good road segmentation performance and high computational efficiency.