Abstract
Increasing coverage of commercial and public satellites allows us to monitor the pulse of the Earth in ever-shorter frequency (Zhu et al., 2017). Together with the rise of deep learning in artificial intelligence (AI) (LeCun et al., 2015), the field of AI for Earth Observation (AI4EO) is growing rapidly. However, many supervised deep learning techniques are data-hungry, which means that annotated data in large quantities are necessary to help these algorithms reach their full potential. In many Earth Observation applications such as change detection, this is often infeasible because high-quality annotations require manual labeling which is time-consuming and costly.  Self-supervised learning (SSL) can help tackle the issue of limited label availability in AI4EO. In SSL, an algorithm is pretrained with tasks that only require the input data without annotation. Notably, Masked Autoencoders (MAE) have shown promising performances recently where a Vision Transformer learns to reconstruct a full image with only 25% of it as input. We hypothesize that the success of MAEs also extends to satellite imagery and evaluate this with a change detection downstream task. In addition, we provide a multitemporal DINO baseline which is another widely successful SSL method. Further, we test a second version of MAEs, which we call GeoMAE. GeoMAE incorporates the location and date of the satellite image as auxiliary information in self-supervised pretraining. The coordinates and date information are passed as additional tokens to the MAE model similar to the positional encoding. The pretraining dataset used is the RapidAI4EO corpus which contains multi-temporal Planet Fusion imagery for a variety of locations across Europe. The dataset for the downstream task also uses Planet Fusion in pairs as input data. These are provided on a 600m * 600m patch level three months apart together with a classification if the respective patch has changed in this period. Self-supervised pretraining is done for up to 150 epochs where we take the model with the best validation performance on the downstream task as a starting point for the test set. We find that the regular MAE model scores the best on the test set with an accuracy of 81.54% followed by DINO with 80.63% and GeoMAE with 80.02%. Pretraining MAE with ImageNet data instead of satellite images results in a notable performance loss down to 71.36%. Overall, our current pretraining experiments can not yet confirm our hypothesis that GeoMAE is advantageous compared to regular MAE. However, in similar spirit, Cong et al. (2022) recently introduced SatMAE which outlines that for other remote sensing applications, the combination of auxiliary information and novel masking strategies is a key factor. Therefore, it seems that a combination of location and time inputs together with adapted masking may also hold the most potential for change detection. There is ample potential for future research in geo-specific applications of MAEs and we provide a starting point for this with our experimental results for change detection. 
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.