Accurate spatial information for agricultural field parcels is important for agricultural production management and understanding agro-industrialization and intensification. However, traditional remote sensing methods that rely on single-modal or single-date data struggle to identify heterogeneous field parcels, particularly in regions dominated by smallholder farming systems. To address this challenge, we proposed a Dual branch Spatiotemporal Fusion Network (DSTFNet) that integrated very high-resolution (VHR) images and medium-resolution satellite image time series (MRSITS) to extract agricultural field parcels over various landscapes. The DSTFNet consisted of two branches: a spatial branch that extracted spatial features from VHR images and a temporal branch that explored seasonal spectral dynamics from MRSITS data by using ConvLSTM units and an attention module. We evaluated the DSTFNet in four regions across China by using GF-2 and Sentinel-2 data. The results showed that DSTFNet performed well in delineating agricultural field parcels, achieving the highest Matthew’s correlation coefficient (MCC) = 0.823 for the field extent, the highest F1-score of edge (Fedge) = 0.865 for field boundary, and the lowest errors of segmentation evaluation index (SEI) = 0.191 for the vectorized field parcels in Hubei province. In addition, DSTFNet significantly outperformed two single-branch models that used VHR or MRSITS alone, as well as existing BsiNet, ResUNet_a, UNet and RAUNet models. DSTFNet also showed good spatial transferability in distinct regions without training data (on average, MCC = 0.728, Fedge = 0.729, and SEI = 0.281 for three target regions). Using limited training data to fine-tune the DSTFNet can further improve its ability to delineate field parcels over complex regions. The visualization analysis of temporal attention weights demonstrated that DSTFNet can well capture cropland spectral dynamics, making it advantageous in extracting diverse cropland parcels. By exploiting important spectral, spatial and temporal information from multimodal satellite data, DSTFNet provided an effective, robust, and transferable solution for accurately delineating agricultural field parcels across heterogeneous farming systems.
Read full abstract