ABSTRACT Deep learning techniques have been applied to extract road areas from remote sensing images, leveraging their efficient and intelligent advantages. However, the contradiction between the effective receptive field and coverage range, as well as the conflict between the depth of the network and the density of geographic information, hinders further improvement in extraction accuracy. To address these challenges, we propose a novel semantic segmentation network called D-FusionNet. D-FusionNet integrates the Dilated Convolutional Block (DCB) module, which serves as a technique for expanding the receptive field and mitigating feature loss, resembling a residual mechanism during encoding. We evaluate the extraction capability of D-FusionNet using GF-2 (Gaofen-2) satellite datasets and Massachusetts aerial photography datasets. The experimental results demonstrate that D-FusionNet performs well in road extraction tasks. Compared to FCN, UNet, LinkNet, D-LinkNet, and FusionNet, D-FusionNet achieves an average improvement of 5.35% in F1-score, 7.12% in IoU (Intersection over Union), and 5.61% in MCC (Matthews Correlation Coefficient) on the GF-2 dataset. For the Massachusetts dataset, there is an average improvement of 2.48% in F1-score, 3.25% in IoU, and 2.25% in MCC. This study provides valuable support for road extraction from remote sensing images.