Abstract
Recently, convolutional neural network (CNN) dominates the ground-based cloud image segmentation task, but disregards the learning of long-range dependencies due to the limited size of filters. Although Transformer-based methods could overcome this limitation, they only learn long-range dependencies at a single scale, hence failing to capture multi-scale information of cloud image. The multi-scale information is beneficial to ground-based cloud image segmentation, because the features from small scales tend to extract detailed information while features from large scales have the ability to learn global information. In this paper, we propose a novel deep network named Integration Transformer (InTransformer), which builds long-range dependencies from different scales. To this end, we propose the Hybrid Multi-head Transformer Block (HMTB) to learn multi-scale long-range dependencies, and hybridize CNN and HMTB as the encoder at different scales. The proposed InTransformer hybridizes CNN and Transformer as the encoder to extract multi-scale representations, which learns both local information and long-range dependencies with different scales. Meanwhile, in order to fuse the patch tokens with different scales, we propose Mutual Cross-Attention Module (MCAM) for the decoder of InTransformer which could adequately interact multi-scale patch tokens in a bidirectional way. We have conducted a series of experiments on large ground-based cloud detection database TLCDD and SWIMSEG. The experimental results show that the performance of our method outperforms other methods, proving the effectiveness of the proposed InTransformer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Geoscience and Remote Sensing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.