Detecting roads in automatic driving environments poses a challenge due to issues such as boundary fuzziness, occlusion, and glare from light. We believe that two factors are instrumental in addressing these challenges and enhancing detection performance: global context dependency and effective feature representation that prioritizes important feature channels. To tackle these issues, we introduce DTRoadseg, a novel duplex Transformer-based heterogeneous feature fusion network designed for road segmentation. DTRoadseg leverages a duplex encoder architecture to extract heterogeneous features from both RGB images and point-cloud depth images. Subsequently, we introduce a multi-source Heterogeneous Feature Reinforcement Block (HFRB) for fusion of the encoded features, comprising a Heterogeneous Feature Fusion Module (HFFM) and a Reinforcement Fusion Module (RFM). The HFFM leverages the self-attention mechanisms of Transformers to achieve effective fusion through token interactions, while the RFM focuses on emphasizing informative features while downplaying less important ones, thereby reinforcing feature fusion. Finally, a Transformer decoder is utilized to produce the final semantic prediction. Furthermore, we employ a boundary loss function to optimize the segmentation structure area, reduce false detection areas, and improve model accuracy. Extensive experiments are carried out on the KITTI road dataset. The results demonstrate that, compared with state-of-the-art methods, DTRoadseg exhibits superior performance, achieving an average accuracy of 97.01%, a Recall of 96.35%, and runs at a speed of 0.09 s per picture.
Read full abstract