Water index and Swin Transformer Ensemble (WISTE) for water body extraction from multispectral remote sensing images

Donghui Ma,Liguang Jiang,Jie Li,Yun Shi

doi:10.1080/15481603.2023.2251704

Abstract

ABSTRACT Automatic surface water body mapping using remote sensing technology is greatly meaningful for studying inland water dynamics at regional to global scales. Convolutional neural networks (CNN) have become an efficient semantic segmentation technique for the interpretation of remote sensing images. However, the receptive field value of a CNN is restricted by the convolutional kernel size because the network only focuses on local features. The Swin Transformer has recently demonstrated its outstanding performance in computer vision tasks, and it could be useful for processing multispectral remote sensing images. In this article, a Water Index and Swin Transformer Ensemble (WISTE) method for automatic water body extraction is proposed. First, a dual-branch encoder architecture is designed for the Swin Transformer, aggregating the global semantic information and pixel neighbor relationships captured by fully convolutional networks (FCN) and multihead self-attention. Second, to prevent the Swin Transformer from ignoring multispectral information, we construct a prediction map ensemble module. The predictions of the Swin Transformer and the Normalized Difference Water Index (NDWI) are combined by a Bayesian averaging strategy. Finally, the experimental results obtained on two distinct datasets demonstrate that the WISTE has advantages over other segmentation methods and achieves the best results. The method proposed in this study can be used for improving regional to continental surface water mapping and related hydrological studies.

Full Text