Abstract

ABSTRACT Remote sensing image building segmentation, which is essential in land use and urban planning, is evolving with advancements in deep learning. Conventional methods using convolutional neural networks face limitations in integrating local and global information and establishing long-range dependencies, resulting in suboptimal segmentation in complex scenarios. This paper proposes LMSwin_PNet, a novel segmentation network that addresses the SwinTransformer encoder's deficiency in local information processing through a local feature extraction module. Additionally, it features a multiscale nonparametric merging attention module to enhance feature-channel correlations. The network also incorporates the pyramid large-kernel convolution module, replacing the traditional 3 × 3 convolution in the decoder with multibranch large-kernel convolution, thereby achieving a large receptive field and detailed information capture. Comparative analyses on three public building datasets demonstrated the model's superior segmentation performance and robustness. The results show that LMSwin_PNet produced outputs closely matching labels, showing its potential for broader application in remote sensing image segmentation tasks. It achieved achieving an IoU of 72.35% on the Massachusetts Building Dataset, 91.30% on the WHU Building Dataset, and 78.99% on the Inria aerial-image building dataset. The source code will be freely available at https://github.com/ziyanpeng/pzy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call