Abstract

Building footprint extraction in remote sensing remains challenging due to the diverse appearances of buildings and confusing scenarios. Recently, researchers have revealed that both the globality and locality are vitally important in building footprint extraction tasks and proposed to incorporate the local context and global long-range dependency in the segmentation models. However, the inadequate integration of the globality and locality still leads to incomplete, fake or missing extraction results. To alleviate these problems, a novel segmentation method named Bi-branch Cross-fusion Transformer Network (BCTNet) is proposed in this study. Two parallel branches of the convolutional encoder branch (CB) and the transformer encoder branch (TB) are designed to extract multi-scale feature maps. A concatenation-then-cross-fusion transformer block (CCTB) is put forward to integrate the locality from the CB and globality from the TB in a cross-fusion way at each stage of the encoding process. Then, an adaptive gating module (AGM) is proposed to gate the feature maps from the CCTB to strengthen the important features while suppressing the irrelevant interference information. After that, the segmentation results can be obtained through a simple decoding process. Comprehensive experiments on two benchmark datasets demonstrate that the proposed BCTNet can achieve superior performance compared to the current state-of-the-art (SOTA) segmentation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call