Abstract

Automatic road extraction from very-high-resolution remote sensing images has become a popular topic in a wide range of fields. Convolutional neural networks are often used for this purpose. However, many network models do not achieve satisfactory extraction results because of the elongated nature and varying sizes of roads in images. To improve the accuracy of road extraction, this paper proposes a deep learning model based on the structure of Deeplab v3. It incorporates squeeze-and-excitation (SE) module to apply weights to different feature channels, and performs multi-scale upsampling to preserve and fuse shallow and deep information. To solve the problems associated with unbalanced road samples in images, different loss functions and backbone network modules are tested in the model’s training process. Compared with cross entropy, dice loss can improve the performance of the model during training and prediction. The SE module is superior to ResNext and ResNet in improving the integrity of the extracted roads. Experimental results obtained using the Massachusetts Roads Dataset show that the proposed model (Nested SE-Deeplab) improves F1-Score by 2.4% and Intersection over Union by 2.0% compared with FC-DenseNet. The proposed model also achieves better segmentation accuracy in road extraction compared with other mainstream deep-learning models including Deeplab v3, SegNet, and UNet.

Highlights

  • The recent, continuously expanding use of remote-sensing big data [1,2,3] has made very-high-resolution (VHR) images a vital geographic information data source because of their wide coverage and high accuracy

  • Tao et al [41] proposed a spatial information inference structure, which aimed at the problem of extracting the roads occluded by other objects in remote sensing images, and this structure can learn both the local and global structure information of roads; they improved the continuity and accuracy of road extraction compared with other models by using this structure

  • Its input and output have the same number of feature channels, and the real number has a corresponding global receptive field, which represents the spatial distribution of the corresponding features of this channel

Read more

Summary

Introduction

The recent, continuously expanding use of remote-sensing big data [1,2,3] has made very-high-resolution (VHR) images a vital geographic information data source because of their wide coverage and high accuracy. Tao et al [41] proposed a spatial information inference structure, which aimed at the problem of extracting the roads occluded by other objects in remote sensing images, and this structure can learn both the local and global structure information of roads; they improved the continuity and accuracy of road extraction compared with other models by using this structure. Xie et al [43] combined the efficient LinkNet with a Middle Block to develop a HsgNet model, which made use of global semantic information, long-distance spatial information and relationships, and information of different channels to improve the performances in roads extraction with fewer parameters compared with D-LinkNet. An important aspect of research on road detection is center line extraction, which is not limited to semantic segmentation. TThhee ssttrruuccttuurree ooff tthhee mmooddeell iiss bbaasseedd oonn DDeeeeppllaabb vv33,, wwiitthh tthhee

Nested SE-Deeplab
Model Encoder and Decoder
Selection of Loss Functions
Selection of Backbone Networks and Modules
Comparison with State-of-the-Art
Parameter Settings
Evaluation Indexes
Methods
Model Comparison
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call