Cropland is a fundamental basis for agricultural development and a prerequisite for ensuring food security. The segmentation and extraction of croplands using remote sensing images are important measures and prerequisites for detecting and protecting farmland. This study addresses the challenges of diverse image sources, multi-scale representations of cropland, and the confusion of features between croplands and other land types in large-area remote sensing image information extraction. To this end, a multi-source self-annotated dataset was developed using satellite images from GaoFen-2, GaoFen-7, and WorldView, which was integrated with public datasets GID and LoveDA to create the CRMS dataset. A novel semantic segmentation network, the Global–Local Context Aggregation Network (GLCANet), was proposed. This method integrates the Bilateral Feature Encoder (BFE) of CNNs and Transformers with a global–local information mining module (GLM) to enhance global context extraction and improve cropland separability. It also employs a multi-scale progressive upsampling structure (MPUS) to refine the accuracy of diverse arable land representations from multi-source imagery. To tackle the issue of inconsistent features within the cropland class, a loss function based on hard sample mining and multi-scale features was constructed. The experimental results demonstrate that GLCANet improves OA and mIoU by 3.2% and 2.6%, respectively, compared to the existing advanced networks on the CRMS dataset. Additionally, the proposed method also demonstrated high precision and practicality in segmenting large-area croplands in Chongzhou City, Sichuan Province, China.
Read full abstract