Abstract

Accurate mapping of crop types globally is essential for maintaining food security. In recent years, with the continued launch of the earth observation (EO) satellites, the freely accessible EO data with the high spatial–temporal resolution has made it possible to achieve crop type mapping at a finer scale. However, the difficulty of crop type mapping elevates dramatically with multi-source data (such as optical and SAR), especially when dealing with long time-series across the temporal domain. Currently, most existing crop mapping studies have not considered exploiting complementary information from multi-modular time-series. Therefore, this study proposes a multi-branch self-learning Vision Transformer (MSViT) structure for crop classification to better achieve spatial–temporal feature extraction by jointly using optical-SAR time-series. The experiment results show that our proposed method outperforms the most commonly used deep learning crop classification schemes in terms of overall classification accuracy Kappa coefficient, and the F1-score index. In addition, the experiments quantitatively evaluate the impact of model depth, multi-source features, and contrast learning strategy on accurate crop identification, as well as visualize the degree of spatial–temporal feature contributions within the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call