Abstract
Big models, large datasets, and self-supervised learning (SSL) have recently gained substantial research interest due to their potential to alleviate our reliance on annotations. Considering the current high generalization ability of self-supervised models in literature, we explore in the letter how helpful SSL can be for a crucial task in remote sensing (RS), image scene classification, when forced to rely on only a few labeled samples. We proposed a simple prototype-based classification procedure without training and fine-tuning, which uses open self-supervised features from the contrastive language-image pre-training (CLIP). We test our method by exploiting ready-to-use open features on four diversified benchmark datasets, including red-green-blue (RGB) and multispectral (MS) images. Highly competitive accuracy has been obtained compared to work with similar settings, i.e., based on an exceedingly small number of labels. To the best of our knowledge, our model is the first to achieve such high accuracy in austere label conditions. We further analyze our approach from different perspectives, including its advantages and limitations, reasons for its astonishing performance, potential applications, and future improvements.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have