RCCT-ASPPNet: Dual-Encoder Remote Image Segmentation Based on Transformer and ASPP

Yazhou Li,Linsheng Huang,Zhiyou Cheng,Jinling Zhao,Chuanjian Wang

doi:10.3390/rs15020379

Yazhou Li, Linsheng Huang + Show 3 more

Open Access

https://doi.org/10.3390/rs15020379

Copy DOI

Journal: Remote Sensing	Publication Date: Jan 7, 2023
Citations: 13	License type: CC BY 4.0

Affiliation: Anhui University

Abstract

Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Transformer provided a view of long-distance dependencies in images. In this paper, we propose RCCT-ASPPNet, which includes the dual-encoder structure of Residual Multiscale Channel Cross-Fusion with Transformer (RCCT) and Atrous Spatial Pyramid Pooling (ASPP). RCCT uses Transformer to cross fuse global multiscale semantic information; the residual structure is then used to connect the inputs and outputs. ASPP based on CNN extracts contextual information of high-level semantics from different perspectives and uses Convolutional Block Attention Module (CBAM) to extract spatial and channel information, which will further improve the model segmentation ability. The experimental results show that the mIoU of our method is 94.14% and 61.30% on the datasets Farmland and AeroScapes, respectively, and that the mPA is 97.12% and 84.36%, respectively, both outperforming DeepLabV3+ and UCTransNet.

Full Text