Abstract
The success of vision Transformers (ViTs) relies heavily on the self-attention mechanism, which requires support from appropriate patch tokenization. However, hyperspectral image (HSI) often suffer from significant noise distortions and spectral uncertainty, which result in unstable attention patterns and overfitting due to equivocal tokenization. In this paper, we propose a neighborhood contrastive tokenization task (NeiCoT) to learn compact, semantically meaningful, and context-sensitive tokens for efficient Transformer encoding. Specifically, we employ a predictor on patch embedding to maximize the mutual information between local individuals and their global average anchor. This encourages neighboring tokens’ relevance and active participation in feature learning. Next, we revise a token-level contrastive loss to align predictions with local individuals and distinguish them from other samples in a mini-batch to enhance tokens rich in contextual semantics. Furthermore, we apply a Gaussian weighting to the tokens’ contrastive loss to balance the neighborhood contribution. Finally, we propose a sequence-specific MAE framework with NeiCoT to achieve HSI representation, and additionally validate NeiCoT on a supervised Transformer backbone. The results demonstrate that NeiCoT consistently enhances the robustness and generalization of the Transformer, achieving accurate object recognition and boundary localization even with limited training samples. Our code will be available at https://github.com/zoegnov07/NeiCoT.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Applied Earth Observation and Geoinformation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.