Accurate and timely crop distribution data are crucial for governments, in order to make related policies to ensure food security. However, agricultural ecosystems are spatially and temporally dynamic systems, which poses a great challenge for accurate crop mapping using fine spatial resolution (FSR) imagery. This research proposed a novel Tri-Dimensional Multi-head Self-Attention Network (TDMSANet) for accurate crop mapping from multitemporal fine-resolution remotely sensed images. Specifically, three sub-modules were designed to extract spectral, temporal, and spatial feature representations, respectively. All three sub-modules adopted a multi-head self-attention mechanism to assign higher weights to important features. In addition, the positional encoding was adopted by both temporal and spatial submodules to learn the sequence relationships between the features in a feature sequence. The proposed TDMSANet was evaluated on two sites utilizing FSR SAR (UAVSAR) and optical (Rapid Eye) images, respectively. The experimental results showed that TDMSANet consistently achieved significantly higher crop mapping accuracy compared to the benchmark models across both sites, with an average overall accuracy improvement of 1.40%, 3.35%, and 6.42% over CNN, Transformer, and LSTM, respectively. The ablation experiments further showed that the three sub-modules were all useful to the TDMSANet, and the Spatial Feature Extraction Module exerted larger impact than the remaining two sub-modules.
Read full abstract