Abstract

ABSTRACT Building extraction is significant in urban planning, economic evaluation, and driverless technology development. However, automatic building extraction from high spatial resolution remote sensing images has been a challenging task due to the various building shapes and colors, imaging conditions, and complex background objects. Current methods in building extraction are generally based on deep convolution networks, and they mostly use an encoder-decoder architecture, wherein detailed building features and small buildings are easily omitted in continuous convolution operations. Moreover, buildings with blurred boundaries are only completely extracted with difficulty. To meet these challenges, we propose a multi-task architecture of frequency-spatial learning Transformer to extract buildings from high spatial resolution remote sensing images. Different from current architecture, we designed a frequency-spatial learning module in the framework of multi-task to synthesize the multi-scale spatial features and frequency decomposition features of high-resolution image. Spiking convolution is proposed in this study to enhance the frequency features of buildings by mimicking the neural transmission in human brains. In this way, multi-scale building features can be better preserved and distinguished from background objects. Moreover, a masked-attention Transformer is adopted to improve multi-scale building mask prediction accuracy by synthesizing successive pixel-wise up-sampled feature maps. We also propose a strategy to evaluate the practical transferability of the proposed method by mimicking practical application cases through training and evaluating images with different spatial resolutions from different study areas and datasets. Experiments using five public building datasets (WHU-Building Satellite Dataset I, WHU-Building Satellite Dataset II, Massachusetts Buildings Dataset, Inria Aerial Image Dataset, xBD Building Dataset) demonstrate the strong potential applicability of our proposed method for practical application cases. Our method outperforms five recently proposed state-of-the-art semantic segmentation methods with 36.60% accuracy improvement on extracted buildings and approximately 53.55% recall progress in extracting small building instances. The implementation code will be released after the paper is published.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.