ABSTRACT Road detection in remote sensing (RS) images plays a critical role in applications ranging from urban planning to autonomous navigation systems. However, accurate road extraction remains a challenging task due to the presence of textual-similar objects that can be visually confused with roads, and shadows that can obscure road features. For this regard, we present a novel end-to-end network that leverages multi-feature fusion and multi-attention. Multi-level dual residual blocks are introduced to capture multi-scale and multi-level road features. A channel attention feature fusion module fuses road features from the decoder while suppressing coarse-grain noises. Moreover, a difference channel feature fusion module filters out the interfering texture-similar features and irrelevant background features. Subsequently, a spatial channel feature enhancement module is designed to identify the mis-segmented regions and reclassify these pixels. We conducted extensive experiments on diverse datasets, demonstrating the effectiveness of our approach in improving road detection accuracy, especially in areas with shadows and textual-similar objects. The results indicate that our approach outperforms state-of-the-art methods in handling textual-similar objects and shadow-occluded roads, making it a valuable contribution to the field of road extraction in remote sensing images.
Read full abstract