Abstract
Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, priorwork has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporatesCNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections bothwithin and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitationsof Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.