Explainable Crop Classification Using a BERT-Based Bidirectional Attention Multimodal Transformer
Accelerating climate change and the intensifying global food security crisis have increased the importance of reliable crop classification across diverse environmental conditions. Existing crop classification models have primarily focused on improving accuracy by learning spectral and temporal patterns from satellite imagery; however, their black-box nature makes it difficult to understand the rationale behind each prediction, limiting their applicability in real-world agricultural decision-making. To address this issue, this study introduces a multimodal Transformer model that incorporates a BERTbased bidirectional attention mechanism, aiming to retain classification performance while enhancing interpretability. The proposed BERT Hybrid model employs a PVT backbone to extract spatial features from Sentinel-2 satellite imagery and integrates them with meteorological time-series embeddings; bidirectional self-attention is then used to jointly model cross-temporal and cross-modal interactions. We further conduct comparative experiments under the same conditions as the MMST-ViT(Multi-Modal Spatial-Temporal Vision Transformer) baseline, evaluating not only overall accuracy but also temporal attention patterns across crop growth stages and the relative importance of different weather variables. Experimental results show that bidirectional attention alleviates excessive focus on specific timestamps or single variables, producing more consistent and interpretable attention distributions. This study highlights the performance– interpretability trade-off in multimodal agricultural AI models and provides a foundation for building trustworthy deeplearning systems for crop monitoring. In addition, because the proposed approach relies solely on globally accessible Sentinel-2 satellite imagery and publicly available meteorological data, it demonstrates the potential for constructing large-scale crop monitoring systems at low cost, aligning with the principles of appropriate technology.
- Research Article
17
- 10.1109/lsp.2022.3181849
- Jan 1, 2022
- IEEE Signal Processing Letters
As a crucial task for video analysis, social relation recognition from characters provides intelligent applications with great potential to better understand the behaviors or emotions of human beings. Most existing methods mainly focus on training models from a large amount of labeled data. However, labeling social relations in videos is time-consuming. To solve this problem, we propose a Pre-trained Multimodal Feature Learning (PMFL) framework for self-supervised learning from unlabeled video data, and then transfer the pre-trained PMFL to downstream social relationship recognition task. First, the space-time interaction between visual instances, and cross-modal interaction between visual and textual information provide important cues for social relation understanding. To incorporate these cues, we design a Multimodal Instance Interaction Transformer (MIIT), which consists of two Transformers to capture intra-modal and cross-modal information interaction, respectively. Second, to better endow PMFL with the capability of learning visual and textual semantic features, we pre-train it via two tasks: Masked Action Feature Regression (MAFR) and Masked Object Label Classification (MOLC). These tasks can help learn both intra-modal and cross-modal semantic information. After fine-tuning PMFL from pre-trained parameters, it achieves the state-of-the-art results on a public benchmark.
- Research Article
1
- 10.5194/isprs-annals-x-5-w2-2025-397-2025
- Dec 19, 2025
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. The demand for food production is increasing rapidly with a surge in the population. To cope with this increasing food demand, precise agricultural management is essential. The existing techniques involve extensive field surveys for agricultural land discrimination. To minimize the man-hour efforts and time required by these techniques, automated techniques for precise crop type mapping and monitoring have been used. These techniques utilize satellite imagery and advanced machine learning techniques for crop type mapping and monitoring. However, the performance of such techniques is affected by factors such as fragmented land parcels, seasonal variability, and inconsistent field-level observations. To overcome these issues, this study attempts to classify grape and non-grape crops and monitor their phenological stages in the study area in Pune district, India, using Sentinel-2 satellite imagery and deep learning (DL) segmentation techniques: U-Net and DeepLabV3. Further, Sentinel- 1C SAR imagery (VV and VH polarization) for the years 2016 to 2024 was utilized to train and evaluate a long short-term memory network (LSTM) model with an aim to analyze the temporal behavior of the grape crop from pruning to harvesting stage with emphasis on growth stages like leaf set, fruit set, and ripening. The experimental results demonstrate that U-Net outperforms DeepLabV3 (F1-score: 0.96; mAP: 0.95) in grape crop classification. The LSTM model showed performance (F1-score 0.82) for phenological stage identification. This study can help agricultural stakeholders in effective and large-scale crop discrimination with minimum human intervention. It has the potential to reveal grape distribution and development stages in a faster time.
- Research Article
6
- 10.3390/agronomy14051084
- May 20, 2024
- Agronomy
Accurate crop classification is of vital importance for agricultural water management. Most researchers have achieved crop classification by model optimization in the same temporal and regional domain by adjusting the value of input features. This study aims to improve the accuracy of crop classification across temporal and spatial domains. Sentinel-2 satellite imagery is employed for crop classification training and prediction in selected farming areas of Heilongjiang Province by calculating vegetation indices and constructing sequential input feature datasets. The HUNTS filtering method was used to mitigate the influence of cloud cover, which increased the stability and completeness of the input feature data across different years. To address the issue of shifts in the input feature values during cross-scale classification, this study proposes the hypothesis testing distribution method (HTDM). This method balances the distribution of input feature values in the test set even without knowing the crop distribution, thereby enhancing the accuracy of the classification test set. The results indicate that the HTDM significantly improves prediction accuracy in cases of substantial image quality variance. In 2022, the recognition accuracy for crop types at all farms processed by the HTDM was above 87%, showcasing the strong robustness of the HTDM.
- Conference Article
2
- 10.1109/rast.2017.8002998
- Jun 1, 2017
Increasing world population, global climate change and environmental deteriorations force food and agriculture sector to increase production under difficult conditions. Crop monitoring and precision agriculture are significant legs in this process and crop classification is usually the first step. Remote sensing provides a less costly and more practical way compared to conventional methods in crop classification. In this study, contribution of spatial information provided by morphological opening and closing profiles to crop classification performances of time series SAR and electro-optical satellite data are assessed separately. In both, significant improvements are observed in overall classification accuracies; however, SAR benefited more from it. In nine-class classification problem, overall classification performance reached around 90% for both sensors.
- Research Article
- 10.3390/agriculture16070727
- Mar 25, 2026
- Agriculture
Accurate crop classification is critical for optimizing agricultural resource use and informing production decisions. Deep learning, with its robust feature extraction ability, has become a prevalent technique for remote sensing-based crop classification. However, agricultural landscape complexity poses three key challenges: background noise interference, class confusion from inter-crop spectral similarity, and blurred small-area crop boundaries due to class imbalance. This paper proposes FCR-TransUNet, a TransUNet-based enhanced model integrating three modules: Feature Enhancement Module (FEM) for noise filtering, Class-Attention (CAExperimental results on the Youyi Farm and barley datasets validate the superiority of the proposed model. On the Youyi Farm dataset, FCR-TransUNet achieves an MIoU of 92.2%, representing an improvement of 1.8% over SAM2-UNet and 2.9% over the baseline TransUNet. On the barley dataset, it yields an MIoU of 89.9%. Ablation studies further verify the effectiveness of each designed module. To comprehensively evaluate the classification performance of FCR-TransUNet across the full crop growth cycle, experiments were conducted using remote sensing images from May, July, and August, respectively. The results demonstrate that FCR-TransUNet exhibits strong stability and adaptability at different crop growth stages, providing a reliable solution for precision agriculture and intelligent agricultural production.
- Research Article
13
- 10.3390/rs13132517
- Jun 27, 2021
- Remote Sensing
Accurate temporal land use mapping provides important and timely information for decision making for large-scale management of land and crop production. At present, temporal land cover and crop classifications within a study area have neglected the differences between subregions. In this paper, we propose a classification rule by integrating the terrain, time series characteristics, priority, and seasonality (TTPSR) with Sentinel-2 satellite imagery. Based on the time series of Normalized Difference Water Index (NDWI) and Vegetation Index (NDVI), a dynamic decision tree for forests, cultivation, urban, and water was created in Google Earth Engine (GEE) for each subregion to extract cultivated land. Then, with or without this cultivated land mask data, the original classification results for each subregion were completed based on composite image acquisition with five vegetation indices using Random Forest. During the post-reclassification process, a 4-bit coding rule based on terrain, type, seasonal rhythm, and priority was generated by analyzing the characteristics of the original results. Finally, statistical results and temporal mapping were processed. The results showed that feature importance was dominated by B2, NDWI, RENDVI, B11, and B12 over winter, and B11, B12, NDBI, B2, and B8A over summer. Meanwhile, the cultivated land mask improved the overall accuracy for multicategories (seven to eight and nine to 13 during winter and summer, respectively) in each subregion, with average ranges in the overall accuracy for winter and summer of 0.857–0.935 and 0.873–0.963, respectively, and kappa coefficients of 0.803–0.902 and 0.835–0.950, respectively. The analysis of the above results and the comparison with resampling plots identified various sources of error for classification accuracy, including spectral differences, degree of field fragmentation, and planting complexity. The results demonstrated the capability of the TTPSR rule in temporal land use mapping, especially with regard to complex crops classification and automated post-processing, thereby providing a viable option for large-scale land use mapping.
- Research Article
7
- 10.3390/rs14163917
- Aug 12, 2022
- Remote Sensing
The verification and monitoring of agricultural subsidy claims requires combined evaluation of several criteria at the scale of over a million cultivation units. Sentinel-2 satellite imagery is a promising data source and paying agencies are encouraged to test their pre-operational use. Here, we present the outcome of the Hungarian agricultural subsidy monitoring pilot: our goal was to propose a solution based on open-source components and evaluate the main strengths and weaknesses for Sentinel-2 in the framework of a complex set of tasks. These include the checking of the basic cultivation of grasslands and arable land and compliance to the criteria of ecological focus areas. The processing of the satellite data was conducted based on random forest for crop classification and the detection of cultivation events was conducted based on NDVI (Normalized Differential Vegetation Index) time series analysis results. The outputs of these processes were combined in a decision tree ruleset to provide the final results. We found that crop classification provided good performance (overall accuracy 88%) for 22 vegetation classes and cultivation detection was also reliable when compared to on-screen visual interpretation. The main limitation was the size of fields, which were frequently small compared to the spatial resolution of the images: more than 4% of the parcels had to be excluded, although these represent less than 3% of the cultivated area of Hungary. Based on these results, we find that operational satellite-based monitoring is feasible for Hungary, and expect further improvements from integration with Sentinel-1 due to additional temporal resolution.
- Research Article
12
- 10.1080/15481603.2023.2281142
- Nov 16, 2023
- GIScience & Remote Sensing
Accurate and near-real-time crop mapping from satellite imagery is crucial for agricultural monitoring. However, the seasonal nature of crops makes it challenging to rely on traditional machine learning methods and previous samples generated within specific domains. In this study, we improved the histogram matching method for color correction of multi-temporal images and tested the performance and prediction classification accuracy of three semantic segmentation models based on weak samples. Classification experiments were conducted for nine categories in two cities in Henan province from 2019 to 2022 using 10 m resolution Sentinel-2 images with different feature selection schemes. We trained the models using classified and recorrected results in four selected sites in 2019 and 2020, and designed experiments to assess the performance of the improved histogram matching method and verify the transferability of semantic segmentation models across regions and years. The experimental results showed that the UNet++ model with feature selection and improved histogram matching methods outperformed other models, such as DeepLab V3+ and UNet, in crop classification transfer cases, with better model performance and higher classification accuracy. The UNet++ model without training samples achieved optimal overall accuracy, Kappa coefficient, and mean F1-score values from 2019 to 2022, exceeding 87%, 82%, and 65%, respectively. Moreover, the representative error of weak samples and prediction classification results were analyzed to improve the model robustness. As an application of transfer-learning in crop mapping, the proposed model effectively addressed the classification problem of multispectral satellite imagery with missing labels.
- Research Article
17
- 10.5194/isprs-archives-xlii-3-w6-187-2019
- Jul 26, 2019
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. The development of kharif rice yield prediction models was attempted through Machine Learning approaches such as Artificial Neural Network and Random Forest for the 42 blocks covering 13,141 sq km upland rainfed area of Purulia and Bankura district, West Bengal. Models were developed integrating monthly NDVI with weather and non-weather variables at block-level for the period 2006 to 2015. The model correlation obtained was 0.702 with MSE 0.01. Though the weather variables vs NDVI models are quite satisfactory, NDVI vs kharif rice yield models however, show relatively less correlation, about 0.6 revealing the requirement of varied additional farmer-controlled inputs. Development of NDVI vs crop yield models for different crop growth stages or fortnightly over a larger data set with selective adding of weather and non-weather variables to NDVI would be the most appropriate.
- Research Article
5
- 10.1111/jbi.14721
- Sep 19, 2023
- Journal of Biogeography
AimSpatial models are valuable for revealing biodiversity patterns but are less commonly applied to soil microbes than to aboveground macroorganisms. Ectomycorrhizal (EM) fungi are symbiotic microbes with high taxonomic and functional diversity that are associated with forest trees. We aimed to predict regional‐scale spatial patterns of EM fungal richness and community composition.LocationForests and subalpine ecosystems in Japan, from Hokkaido to Okinawa.TaxonEM fungi (Asco‐ and Basidiomycetes).MethodsWe used EM fungal DNA sequence data from 1507 soil cores at 39 sites covering a wide range of environmental conditions. The random forest machine learning approach was applied to determine the relative importance of environmental variables (i.e. climate, soil and ecosystem productivity) and to make spatial predictions. The spatial patterns of EM fungal richness and community composition were mapped at 1‐km2 grid resolution.ResultsTemperature generally had a strong influence on EM fungal richness and community composition dissimilarity. Our regional spatial analysis revealed that (1) EM fungal richness was higher in northeastern and montane regions than in southwestern regions and low‐elevation plains, (2) different EM fungal lineages exhibited contrasting spatial diversity patterns and (3) community composition dissimilarity shifted sharply from high to low elevations, and gradually from northeastern to southwestern regions, mainly in relation to climate gradients. Areas with low applicability for spatial modelling were identified based on multidimensional environmental spaces, which will help to prioritize data collection for future research.Main ConclusionsOur study provides a baseline of the potential spatial patterns of EM fungal communities, which were explained primarily by climate variables and secondarily by soil factors and ecosystem productivity. The predicted spatial patterns may be valuable for identifying diversity hotspots and advancing the assessment of climate change impacts on ecologically important root‐associated fungi.
- Research Article
1
- 10.1016/j.envexpbot.2023.105284
- Mar 2, 2023
- Environmental and Experimental Botany
Spatial patterns and determinants of nitrogen composition in the trunk xylem sap of tree species from tropical to temperate forests
- Conference Article
8
- 10.1109/igarss.2019.8900491
- Jul 1, 2019
Real-time monitoring of agricultural crops is an important exercise because of it’s huge impact on agri-business and agricultural policy management. Identification of crops during multiple crop growth stages can help formulate better agricultural policies and management strategies. In this context, the objective of this article is to evaluate the potential of Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 optical imagery in crop classification for an Indian region. A multi-class classification algorithm based on the support vector machine (SVM) is applied to the temporal features extracted from the above mentioned satellite data sets. The experiments are conducted for Kharif and Rabi crop cycles with major crops in the region. The experiments suggest that the joint use of optical and radar imagery results in better classification accuracy compared to using them individually. An overall accuracy of 89% and 96% is obtained for Kharif and Rabi crops, respectively.
- Research Article
68
- 10.3390/rs12244052
- Dec 11, 2020
- Remote Sensing
Timely and accurate crop classification is of enormous significance for agriculture management. The Shiyang River Basin, an inland river basin, is one of the most prominent water resource shortage regions with intensive agriculture activities in northwestern China. However, a free crop map with high spatial resolution is not available in the Shiyang River Basin. The European Space Agency (ESA) satellite Sentinel-2 has multi-spectral bands ranging in the visible-red edge-near infrared-shortwave infrared (VIS-RE-NIR-SWIR) spectrum. Understanding the impact of spectral-temporal information on crop classification is helpful for users to select optimized spectral bands combinations and temporal window in crop mapping when using Sentinel-2 data. In this study, multi-temporal Sentinel-2 data acquired in the growing season in 2019 were applied to the random forest algorithm to generate the crop classification map at 10 m spatial resolution for the Shiyang River Basin. Four experiments with different combinations of feature sets were carried out to explore which Sentinel-2 information was more effective for higher crop classification accuracy. The results showed that the augment of multi-spectral and multi-temporal information of Sentinel-2 improved the accuracy of crop classification remarkably, and the improvement was firmly related to strategies of feature selections. Compared with other bands, red-edge band 1 (RE-1) and shortwave-infrared band 1 (SWIR-1) of Sentinel-2 showed a higher competence in crop classification. The combined application of images in the early, middle and late crop growth stage is significant for achieving optimal performance. A relatively accurate classification (overall accuracy = 0.94) was obtained by utilizing the pivotal spectral bands and dates of image. In addition, a crop map with a satisfied accuracy (overall accuracy > 0.9) could be generated as early as late July. This study gave an inspiration in selecting targeted spectral bands and period of images for acquiring more accurate and timelier crop map. The proposed method could be transferred to other arid areas with similar agriculture structure and crop phenology.
- Book Chapter
13
- 10.1007/978-3-642-14400-4_32
- Jan 1, 2010
This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented.
- Research Article
- 10.3724/j.fjyl.la20250684
- Mar 1, 2026
- Landscape Architecture
<sec><title>Objective</title> Climate change, biodiversity loss, and environmental pollution are widely recognized as the triple planetary crisis. Among them, climate change has intensified the frequency and magnitude of extreme wind events, particularly typhoons, resulting in substantial impacts on ecosystems and human societies. China is located within the active typhoon belt of the northwest Pacific, where approximately 80% of annual typhoons make landfall. Coastal regions exhibit pronounced spatial heterogeneity in wind disaster risk due to complex interactions among topography, climate conditions, and socioeconomic development. Protected areas, as critical spatial units for biodiversity conservation and ecological security, are increasingly exposed to wind hazards. However, systematic assessments of wind disaster risk at the protected-area scale remain limited. Existing studies predominantly adopt the three-dimensional “hazard−exposure−vulnerability” framework proposed by the Intergovernmental Panel on Climate Change (IPCC). In this framework, hazard represents the intensity and frequency of disasters, exposure reflects the degree to which natural and social elements are affected, and vulnerability indicates the likelihood of system damage. While this framework has been widely applied to floods, earthquakes, heatwaves, and other natural hazards, its application to wind disaster risk in protected areas is still insufficient. In particular, previous studies often fail to integrate long-term hazard dynamics with ecological and socio-economic characteristics, limiting their ability to support targeted risk management and spatial planning. </sec><sec><title>Methods</title> To address these gaps, drawing on the <italic>Sixth Assessment Report</italic> of the Intergovernmental Panel on Climate Change, we developed a three-dimensional wind disaster risk assessment framework integrating hazard, exposure, and vulnerability. The framework combined multi-source environmental and socio-economic data to quantify wind disaster risk and reveal its spatial differentiation and temporal evolution. The Fuzhou Metropolitan Area was selected as the case study because it is located along China’s southeastern coast, characterized by frequent typhoon activity, diverse protected area types, and pronounced coastal-inland gradients, making it a representative region for examining wind disaster risks under climate change. Within this framework, wind disaster risk levels of protected areas in 1980 and 2020 were quantified and compared. Multi-criteria evaluation methods were applied to construct the hazard, exposure, and vulnerability indices, while the entropy weight method was used to reduce subjectivity in indicator selection. ArcGIS spatial analysis techniques, including spatial overlay,zonal statistics, and hotspot analysis, were employed to analyze the spatial patterns and temporal dynamics of wind hazards, exposure, vulnerability, and comprehensive risk. At the indicator level, meteorological, topographic, ecological, and socio-economic data were integrated to conduct comparative risk assessments across protected areas in the Fuzhou metropolitan area. </sec><sec><title>Results</title> 1) Wind disaster risk exhibited a clear spatial pattern characterized by higher risk in the south (0.57) and lower risk in the north (0.09), with coastal protected areas generally facing higher risk levels than inland areas. Wind disaster risk showed clear spatial clustering, with high-risk protected areas (0.61−0.66) concentrated in the southern and southwestern regions, medium−high risk areas (0.500−0.550) in the central transition zone, and low-risk areas (≤0.01) mainly distributed in the northern and northeastern regions, showing a pronounced south−north decreasing gradient. 2)Exposure levels across protected areas were generally moderate to high, while vulnerability showed an overall increasing trend from 1980 to 2020, indicating growing sensitivity of protected areas to wind hazards over time. In 1980, high-exposure areas (0.59−0.62) were located in northwest mountains and central hills, and low-exposure areas (0.04) were along the eastern coast. By 2020, high-exposure zones persisted but declined (e.g., from 0.59 to 0.31), with low coastal exposure unchanged, showing stable spatial patterns and an overall decrease. 3)Comprehensive wind disaster risk differed markedly among protected area types, ranked from high to low as forest parks to scenic areas, nature reserves, wetland parks, and geological parks. High-risk protected areas, including Jiulihu Scenic Area, Dafeishan, and Biqing Forest Parks (0.54−0.57), clustered in the south and south-central region. Medium-risk areas (0.30−0.50) occupied central and coastal transitional zones. Low-risk areas, such as Dongchong Peninsula, Sandu’ao, and Baiyunshan Parks (≤0.20), were located in the north and inland mountains. </sec><sec><title>Conclusion</title> Based on these findings, we proposed three planning optimization strategies for protected areas: optimizing functional zoning to reflect spatial risk differentiation, establishing dynamic wind hazard monitoring and early-warning mechanisms, and implementing pilot-based differentiated risk mitigation measures tailored to specific risk profiles. We analyzed wind disaster risks across temporal and spatial scales and visualized their dynamics through spatial mapping. Focusing on the protected area level, fine-scale spatial heterogeneity and temporal evolution patterns can be identified, which are often obscured in conventional assessments. By revealing the spatial patterns and evolution characteristics of wind disaster risk from a protected-area perspective, we provided an assessment framework that balances universality and practicality. The framework can offer practical support for climate-resilient planning and governance of protected area systems under ongoing climate change. </sec>