Abstract

Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but usually suffer from high computational complexity and memory usage. In this paper, we explored the potential of using transformers for efficient building extraction. We design an efficient dual-pathway transformer structure that learns the long-term dependency of tokens in both their spatial and channel dimensions and achieves state-of-the-art accuracy on benchmark building extraction datasets. Since single buildings in remote sensing images usually only occupy a very small part of the image pixels, we represent buildings as a set of “sparse” feature vectors in their feature space by introducing a new module called “sparse token sampler”. With such a design, the computational complexity in transformers can be greatly reduced over an order of magnitude. We refer to our method as Sparse Token Transformers (STT). Experiments conducted on the Wuhan University Aerial Building Dataset (WHU) and the Inria Aerial Image Labeling Dataset (INRIA) suggest the effectiveness and efficiency of our method. Compared with some widely used segmentation methods and some state-of-the-art building extraction methods, STT has achieved the best performance with low time cost.

Highlights

  • Building extraction from remote sensing images refers to the automatic process of identifying building and non-building pixels in remote sensing images

  • We propose an efficient method for building extraction based on transformers

  • For better understanding the candidate positions in the sparse token sampler, we show some examples of the spatial probabilistic maps As produced by SST( R18, S4) from Wuhan University Aerial Building Dataset (WHU) and Inria Aerial Image Labeling Dataset (INRIA) building datasets

Read more

Summary

Introduction

Building extraction from remote sensing images refers to the automatic process of identifying building and non-building pixels in remote sensing images. Building extraction plays an important role in many applications, such as urban planning [1], population estimation [2,3], economic activities distribution [4], disaster reporting [5], illegal building construction, and so forth. It can be used as an essential prerequisite for downstream tasks like change detection from remote sensing images [6,7]. Automatic building extraction has attracted increasing attention in both academia and industry. It is usually difficult to properly extract complete building boundaries from the complex background

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.