Gabor Feature Network for Transformer-Based Building Change Detection Model in Remote Sensing
Detecting building change in bitemporal remote sensing (RS) imagery requires a model to highlight the changes in buildings and ignore the irrelevant changes of other objects and sensing conditions. Buildings have comparatively less diverse textures than other objects and appear as repetitive visual patterns on RS images. In this paper, we propose Gabor Feature Network (GFN) to extract the distinctive repetitive texture features of buildings. Furthermore, we also design Feature Fusion Module (FFM) to fuse the extracted multiscale features from GFN with the features from a Transformer-based encoder to pass on the texture features to different parts of the model. Using GFN and FFM, we design a Transformer-based model, called GabFormer for building change detection. Experimental results on the LEVIR-CD and WHU-CD datasets indicate that GabFormer outperforms other SOTA models and in particular show significant improvement in the generalization capability. Our code is available on https://github.com/Ayana-Inria/GabFormer.
- Research Article
7
- 10.3390/rs15092470
- May 8, 2023
- Remote Sensing
Change detection is a critical task in remote sensing Earth observation for identifying changes in the Earth’s surface in multi-temporal image pairs. However, due to the time-consuming nature of image collection, labor-intensive pixel-level labeling with the rare occurrence of building changes, and the limitation of the observation location, it is difficult to build a large, class-balanced, and diverse building change detection dataset, which can result in insufficient changed sample pairs for training change detection models, thus degrading their performance. Thus, in this article, given that data scarcity and the class-imbalance issue lead to the insufficient training of building change detection models, a novel multi-temporal sample pair generation method, namely, Image-level Sample Pair Generation (ISPG), is proposed to improve the change detection performance through dataset expansion, which can generate more valid multi-temporal sample pairs to overcome the limitation of the small amount of change information and class-imbalance issue in existing datasets. To achieve this, a Label Translation GAN (LT-GAN) was designed to generate complete remote sensing images with diverse building changes and background pseudo-changes without any of the complex blending steps used in previous works. To obtain more detailed features in image pair generation for building change detection, especially the surrounding context of the buildings, we designed multi-scale adversarial loss (MAL) and feature matching loss (FML) to supervise and improve the quality of the generated bitemporal remote sensing image pairs. On the other hand, we also consider that the distribution of generated buildings should follow the pattern of human-built structures. The proposed approach was evaluated on two building change detection datasets (LEVIR-CD and WHU-CD), and the results proved that the proposed method can achieve state-of-the-art (SOTA) performance, even if using plain models for change detection. In addition, the proposed approach to change detection image pair generation is a plug-and-play solution that can be used to improve the performance of any change detection model.
- Research Article
8
- 10.19184/geosi.v3i2.7934
- Aug 28, 2018
- Geosfera Indonesia
AN ASSESSMENT OF SPATIAL VARIATION OF LAND SURFACE CHARACTERISTICS OF MINNA, NIGER STATE NIGERIA FOR SUSTAINABLE URBANIZATION USING GEOSPATIAL TECHNIQUES
- Research Article
16
- 10.1186/s40537-024-00903-y
- Apr 4, 2024
- Journal of Big Data
Efficiently treating cardiac patients before the onset of a heart attack relies on the precise prediction of heart disease. Identifying and detecting the risk factors for heart disease such as diabetes mellitus, Coronary Artery Disease (CAD), hyperlipidemia, hypertension, smoking, familial CAD history, obesity, and medications is critical for developing effective preventative and management measures. Although Electronic Health Records (EHRs) have emerged as valuable resources for identifying these risk factors, their unstructured format poses challenges for cardiologists in retrieving relevant information. This research proposed employing transfer learning techniques to automatically extract heart disease risk factors from EHRs. Leveraging transfer learning, a deep learning technique has demonstrated a significant performance in various clinical natural language processing (NLP) applications, particularly in heart disease risk prediction. This study explored the application of transformer-based language models, specifically utilizing pre-trained architectures like BERT (Bidirectional Encoder Representations from Transformers), RoBERTa, BioClinicalBERT, XLNet, and BioBERT for heart disease detection and extraction of related risk factors from clinical notes, using the i2b2 dataset. These transformer models are pre-trained on an extensive corpus of medical literature and clinical records to gain a deep understanding of contextualized language representations. Adapted models are then fine-tuned using annotated datasets specific to heart disease, such as the i2b2 dataset, enabling them to learn patterns and relationships within the domain. These models have demonstrated superior performance in extracting semantic information from EHRs, automating high-performance heart disease risk factor identification, and performing downstream NLP tasks within the clinical domain. This study proposed fine-tuned five widely used transformer-based models, namely BERT, RoBERTa, BioClinicalBERT, XLNet, and BioBERT, using the 2014 i2b2 clinical NLP challenge dataset. The fine-tuned models surpass conventional approaches in predicting the presence of heart disease risk factors with impressive accuracy. The RoBERTa model has achieved the highest performance, with micro F1-scores of 94.27%, while the BERT, BioClinicalBERT, XLNet, and BioBERT models have provided competitive performances with micro F1-scores of 93.73%, 94.03%, 93.97%, and 93.99%, respectively. Finally, a simple ensemble of the five transformer-based models has been proposed, which outperformed the most existing methods in heart disease risk fan, achieving a micro F1-Score of 94.26%. This study demonstrated the efficacy of transfer learning using transformer-based models in enhancing risk prediction and facilitating early intervention for heart disease prevention.
- Research Article
77
- 10.3390/rs10050711
- May 4, 2018
- Remote Sensing
Nowadays, our ability to acquire remote sensing data has been improved to an unprecedented level.[...]
- Research Article
- 10.18178/ijfcc.2024.13.4.619
- Jan 1, 2024
- International Journal of Future Computer and Communication
Deep learning is a deep field of neural networks, and its application in remote sensing image classification and recognition processing has attracted attention and discussion from all walks of life. This paper first briefly introduces the traditional remote sensing image processing methods and the limitations of these algorithms and emphasizes the limitations of these techniques. Then, the research status of target recognition and change detection in remote sensing images based on deep learning is discussed, and how to select and design appropriate deep learning models. And then, the datasets of two different provinces were selected for comparative experiments, and the implementation process of target recognition and change detection in remote sensing images was described in detail. Finally, based on the experimental results, the future trend of deep learning application in remote sensing identification and classification is prospected.
- Research Article
- 10.34248/bsengineering.1736319
- Dec 24, 2025
- Black Sea Journal of Engineering and Science
This study focuses on the performance evaluation of mainstream object detection model, namely, YOLO12, Mask-RCNN, RT-DETR, and RF-DETR on the Open Images (Multi-Object) and LaSOT (Single-Object) datasets. Object detection is one of the key technologies in computer vision tasks and very significant developments have been made during the last decades. Current cutting-edge trend applications involve CNN-based and transformer-based object detection models. CNN-based models can use one-pass (YOLO family) or two-pass (R-CNN family) implementations. One-pass object detection models can be faster but suffer from accuracy compared to the two-pass models. Transformer-based models can use Detection Transformers or Vision Transformers. Transformer-based models are gaining popularity, and their performance surpasses CNN-based models. Transformer-based models are also improving their speed. This study evaluates YOLO12 and Mask R-CNN from CNN-based family, and RT-DETR and RF-DETR transformer-based architectures in terms of accuracy and time on the Open Images and LaSOT datasets. All models are the largest models provided by the owners and pretrained on COCO dataset. Transformer-based models incorporate special types of self-attention and pose significant improvement both on accuracy and speed. The experimental results demonstrate that attention and transformer-based models perform better than the traditional CNN-based object detectors.
- Research Article
- 10.1177/20420986251405082
- Dec 18, 2025
- Therapeutic Advances in Drug Safety
Background:Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions.Objective:This study aimed to evaluate whether transformer-based models fine-tuned on a general ADR dataset can effectively classify ADRs from tweets related to glucagon-like peptide-1 (GLP-1) receptor agonists and to benchmark their performance against state-of-the-art large language models (LLMs).Design:This study employed a machine learning approach using transformer-based language models to classify ADRs in social media.Methods:BERT (bidirectional encoder representations from transformers)-base, BERTweet-base, and GPT-2 (Generative Pre-Trained Transformer-2) models were fine-tuned using Sarker and SIDER (Side Effect Resource) datasets for ADR classification. The test dataset comprised 396 tweets mentioning GLP-1 receptor agonists that were categorized as personal experiences. Model performance was primarily evaluated using the F1 score, which was used to select the optimal model. In addition, the fine-tuned transformer models were benchmarked against state-of-the-art LLMs, including ChatGPT 4o, ChatGPT 4o-mini, and Gemini 2.5 Flash.Results:Among 396 tweets, 116 (29.3%) were classified as ADRs and 280 (70.7%) as non-ADRs. Among the transformer-based models, BERTweet-base achieved the highest performance (accuracy: 0.835, F1: 0.729), outperforming both BERT-base (accuracy: 0.826, F1: 0.679) and GPT-2 (accuracy: 0.766, F1: 0.628). Among the LLMs, ChatGPT 4o-mini demonstrated the best results (accuracy: 0.970, F1: 0.948), followed by Gemini 2.5 Flash (accuracy: 0.954, F1: 0.919) and ChatGPT 4o (accuracy: 0.936, F1: 0.895). Overall, LLMs substantially outperformed the fine-tuned transformer-based models.Conclusion:Fine-tuned transformer-based models demonstrated reasonable performance in ADR detection from GLP-1 receptor agonist tweets, with BERTweet-base performing best. However, state-of-the-art LLMs, particularly ChatGPT 4o-mini, substantially outperformed these models, highlighting their potential for pharmacovigilance tasks.
- Research Article
13
- 10.3390/rs15143566
- Jul 16, 2023
- Remote Sensing
Remote sensing change detection (RSCD) is crucial for our understanding of the dynamic pattern of the Earth’s surface and human influence. Recently, transformer-based methodologies have advanced from their powerful global modeling capabilities in RSCD tasks. Nevertheless, they remain under excessive parameterization, which continues to be severely constrained by time and computation resources. Here, we present a transformer-based RSCD model called the Segmentation Multi-Branch Change Detection Network (SMBCNet). Our proposed approach combines a hierarchically structured transformer encoder with a cross-scale enhancement module (CEM) to extract global information with lower complexity. To account for the diverse nature of changes, we introduce a plug-and-play multi-branch change fusion module (MCFM) that integrates temporal features. Within this module, we transform the change detection task into a semantic segmentation problem. Moreover, we identify the Temporal Feature Aggregation Module (TFAM) to facilitate integrating features from diverse spatial scales. These results demonstrate that semantic segmentation is an effective solution to change detection (CD) problems in remote sensing images.
- Research Article
7
- 10.3390/rs15245670
- Dec 8, 2023
- Remote Sensing
The detection of building changes (hereafter ‘building change detection’, BCD) is a critical issue in remote sensing analysis. Accurate BCD faces challenges, such as complex scenes, radiometric differences between bi-temporal images, and a shortage of labelled samples. Traditional supervised deep learning requires abundant labelled data, which is expensive to obtain for BCD. By contrast, there is ample unlabelled remote sensing imagery available. Self-supervised learning (SSL) offers a solution, allowing learning from unlabelled data without explicit labels. Inspired by SSL, we employed the SimSiam algorithm to acquire domain-specific knowledge from remote sensing data. Then, these well-initialised weight parameters were transferred to BCD tasks, achieving optimal accuracy. A novel framework for BCD was developed using self-supervised contrastive pre-training and historical geographic information system (GIS) vector maps (HGVMs). We introduced the improved MS-ResUNet network for the extraction of buildings from new temporal satellite images, incorporating multi-scale pyramid image inputs and multi-layer attention modules. In addition, we pioneered a novel spatial analysis rule for detecting changes in building vectors in bi-temporal images. This rule enabled automatic BCD by harnessing domain knowledge from HGVMs and building upon the spatial analysis of building vectors in bi-temporal images. We applied this method to two extensive datasets in Liuzhou, China, to assess its effectiveness in both urban and suburban areas. The experimental results demonstrated that our proposed approach offers a competitive quantitative and qualitative performance, surpassing existing state-of-the-art methods. Combining HGVMs and high-resolution remote sensing imagery from the corresponding years is useful for building updates.
- Research Article
152
- 10.3390/rs13234779
- Nov 25, 2021
- Remote Sensing
Remote sensing image object detection and instance segmentation are widely valued research fields. A convolutional neural network (CNN) has shown defects in the object detection of remote sensing images. In recent years, the number of studies on transformer-based models increased, and these studies achieved good results. However, transformers still suffer from poor small object detection and unsatisfactory edge detail segmentation. In order to solve these problems, we improved the Swin transformer based on the advantages of transformers and CNNs, and designed a local perception Swin transformer (LPSW) backbone to enhance the local perception of the network and to improve the detection accuracy of small-scale objects. We also designed a spatial attention interleaved execution cascade (SAIEC) network framework, which helped to strengthen the segmentation accuracy of the network. Due to the lack of remote sensing mask datasets, the MRS-1800 remote sensing mask dataset was created. Finally, we combined the proposed backbone with the new network framework and conducted experiments on this MRS-1800 dataset. Compared with the Swin transformer, the proposed model improved the mask AP by 1.7%, mask APS by 3.6%, AP by 1.1% and APS by 4.6%, demonstrating its effectiveness and feasibility.
- Research Article
8
- 10.1155/2020/2725186
- Mar 23, 2020
- Journal of Spectroscopy
In order to improve the change detection accuracy of multitemporal high spatial resolution remote-sensing (HSRRS) images, a change detection method of multitemporal remote-sensing images based on saliency detection and spatial intuitionistic fuzzy C-means (SIFCM) clustering is proposed. Firstly, the cluster-based saliency cue method is used to obtain the saliency maps of two temporal remote-sensing images; then, the saliency difference is obtained by subtracting the saliency maps of two temporal remote-sensing images; finally, the SIFCM clustering algorithm is used to classify the saliency difference image to obtain the change regions and unchange regions. Two data sets of multitemporal high spatial resolution remote-sensing images are selected as the experimental data. The detection accuracy of the proposed method is 96.17% and 97.89%. The results show that the proposed method is a feasible and better performance multitemporal remote-sensing image change detection method.
- Research Article
- 10.63544/ijss.v2i4.68
- Dec 31, 2023
- Inverge Journal of Social Sciences
Remote sensing technology has emerged as a vital tool for monitoring and sustainably managing the environment. This paper reviews recent advances in remote sensing and their applications for environmental sustainability. A comprehensive literature review was conducted focusing on high-resolution analysis, temporal change detection, and hyper-spectral monitoring. Applications highlighted include detailed urban habitat mapping, assessing shoreline erosion, tracking forest disturbances, monitoring crop health, detecting pollution, and mapping coral reef degradation. The results showcase the quantitative insights remote sensing provides across diverse sustainability issues like climate change, urban planning, conservation, and disaster response. The paper emphasizes how ongoing improvements in remote sensing are enhancing environmental modelling capabilities and information availability, playing a key role in evidence-based decision-making for sustainable resource management. References Acharya, T.D. and Lee, D.H., 2019. Remote Sensing and Geospatial Technologies for Sustainable Development: A Review of Applications. Sensors & Materials, 31. Avtar, R., Komolafe, A.A., Kouser, A., Singh, D., Yunus, A.P., Dou, J., Kumar, P., Gupta, R.D., Johnson, B.A., Minh, H.V.T. and Aggarwal, A.K., 2020. Assessing sustainable development prospects through remote sensing: A review. Remote sensing applications: Society and environment, 20, p.100402. Asif, D. M., & Shaheen, A. (2022). Creating a High-Performance Workplace by the determination of Importance of Job Satisfaction, Employee Engagement, and Leadership. Journal of Business Insight and Innovation, 1(2), 9–15. https://doi.org/10.9876/jbii.v1i2.10 Asif, M., Pasha, M. A., Mumtaz, A., & Sabir, B. (2023). Causes of Youth Unemployment in Pakistan. Inverge Journal of Social Sciences, 2(1), 41-50. Bibri, S. E., & Bibri, S. E. (2018). Data science for urban sustainability: Data mining and data-analytic thinking in the next wave of city analytics. Smart Sustainable Cities of the Future: The Untapped Potential of Big Data Analytics and Context–Aware Computing for Advancing Sustainability, 189-246. Bibri, S. E., & Krogstie, J. (2017). The core enabling technologies of big data analytics and context-aware computing for smart sustainable cities: a review and synthesis. Journal of Big Data, 4, 1-50. Estoque, R.C., 2020. A review of the sustainability concept and the state of SDG monitoring using remote sensing. Remote Sensing, 12(11), p.1770. Franklin, S.E., 2001. Remote sensing for sustainable forest management. CRC press. Kour, R., Singh, S., Sharma, H.B., Naik, T.S.S.K., Shehata, N., Pavithra, N., Ali, W., Kapoor, D., Dhanjal, D.S., Singh, J. and Khan, A.H., 2023. Persistence and remote sensing of agri-food wastes in the environment: Current state and perspectives. Chemosphere, p.137822. Kouziokas, G.N. and Perakis, K., 2017. Decision support system based on artificial intelligence, GIS and remote sensing for sustainable public and judicial management. European Journal of Sustainable Development, 6(3), pp.397-397. Lai, Y. (2022). Urban Intelligence for Carbon Neutral Cities: Creating Synergy among Data, Analytics, and Climate Actions. Sustainability, 14(12), 7286. Li, F., Yigitcanlar, T., Nepal, M., Nguyen, K., & Dur, F. (2023). Machine Learning and Remote Sensing Integration for Leveraging Urban Sustainability: A Review and Framework. Sustainable Cities and Society, 104653. Li, J., Pei, Y., Zhao, S., Xiao, R., Sang, X. and Zhang, C., 2020. A review of remote sensing for environmental monitoring in China. Remote Sensing, 12(7), p.1130. Liang, A., Yan, D., Yan, J., Lu, Y., Wang, X. and Wu, W., 2023. A Comprehensive Assessment of Sustainable Development of Urbanization in Hainan Island Using Remote Sensing Products and Statistical Data. Sustainability, 15(2), p.979. Liang, A., Yan, D., Yan, J., Lu, Y., Wang, X., & Wu, W. (2023). A Comprehensive Assessment of Sustainable Development of Urbanization in Hainan Island Using Remote Sensing Products and Statistical Data. Sustainability, 15(2), 979. Pande, C. B., & Moharir, K. N. (2023). Application of hyperspectral remote sensing role in precision farming and sustainable agriculture under climate change: A review. Climate Change Impacts on Natural Resources, Ecosystems and Agricultural Systems, 503-520. Prince, S.D., 2019. Challenges for remote sensing of the Sustainable Development Goal SDG 15.3.1 productivity indicator. Remote Sensing of Environment, 234, p.111428. Rochon, G.L., Johannsen, C.J., Landgrebe, D.A., Engel, B.A., Harbor, J.M., Majumder, S. and Biehl, L.L., 2004. Remote sensing as a tool for achieving and monitoring progress toward sustainability. Technological choices for sustainability, pp.415-428. Seyam, M. M. H., Haque, M. R., & Rahman, M. M. (2023). Identifying the land use land cover (LULC) changes using remote sensing and GIS approach: A case study at Bhaluka in Mymensingh, Bangladesh. Case Studies in Chemical and Environmental Engineering, 7, 100293. Tékouabou, S. C., Chenal, J., Azmi, R., Toulni, H., Diop, E. B., & Nikiforova, A. (2022). Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development. Data, 7(12), 170. West, H., Quinn, N., & Horswell, M. (2019). Remote sensing for drought monitoring & impact assessment: Progress, past challenges and future opportunities. Remote Sensing of Environment, 232, 111291. White, J. C., Coops, N. C., Wulder, M. A., Vastaranta, M., Hilker, T., & Tompalski, P. (2016). Remote sensing technologies for enhancing forest inventories: A review. Canadian Journal of Remote Sensing, 42(5), 619-641. Xiuwan, C., 2002. Using remote sensing and GIS to analyse land cover change and its impacts on regional sustainable development. International journal of remote sensing, 23(1), pp.107-124. Yang, X.X. ed., 2021. Urban remote sensing: monitoring, synthesis and modeling in the urban environment. John Wiley & Sons. Zhu, L., Suomalainen, J., Liu, J., Hyyppä, J., Kaartinen, H., & Haggren, H. (2018). A review: Remote sensing sensors. Multi-purposeful application of geospatial data, 19-42.
- Research Article
- 10.3390/en19030845
- Feb 5, 2026
- Energies
Photovoltaic systems represent one of the most reliable and widely used technologies for electricity generation from renewable energy sources, although their performance is affected by the occurrence of faults and defects that lead to energy losses and efficiency reduction. Therefore, detecting and localizing defects in photovoltaic panels is essential. A wide variety of image analysis techniques based on aerial thermal imagery acquired by drones have been widely implemented for proper maintenance operations, requiring a comprehensive comparison among these approaches to assess their relative performance and suitability for different scenarios. This study presents a comparative evaluation of several vision-based approaches using artificial intelligence for photovoltaic defect detection. YOLO- and Transformer-based models are analyzed and benchmarked in terms of accuracy, inference time, per-class performance, and sensitivity to object size. Experimental results demonstrate that both YOLO- and Transformer-based models are computationally lightweight and suitable for real-time implementation. However, Transformer-based architectures exhibit higher detection accuracy and stronger generalization capabilities, while YOLOv5 achieves superior inference speed. The RF-DETR-Small model provides the best balance between accuracy, computational efficiency, and robustness across different defect types and object scales. These findings highlight the potential of Transformer-based vision models as a highly effective alternative for real-time, on-site photovoltaic fault detection and predictive maintenance applications.
- Research Article
- 10.1371/journal.pone.0342898
- Feb 13, 2026
- PLOS One
This paper introduces a solution to the problem of detecting whether a sequence of text is Vietnamese based on its orthography and contextual features. For those unfamiliar with the language, it is known that understanding the meaning of certain texts can be challenging, since Vietnamese is a complex language that uses Latin characters with diacritics, and many of its words rely heavily on accent marks for semantic distinction. In this paper, we provide insight into how these characteristics influence Transformer-based natural language processing models and propose an approach to address this issue. Transformer-based models are selected due to their superior performance compared to earlier architectures such as RNNs and LSTMs, as well as their widespread application in state-of-the-art NLP systems (GPT, BERT, T5). We examine the specific challenges posed by Vietnamese orthography and word formation, and propose a solution that enhances the model’s ability to distinguish Vietnamese text. Our approach is evaluated on a benchmark dataset, demonstrating high accuracy and robustness in Vietnamese text detection, outperforming conventional methods. The results confirm that Transformer-based models can effectively learn orthographic and contextual patterns in Vietnamese, contributing to improved language identification and multilingual NLP processing.
- Research Article
30
- 10.1109/jstars.2022.3175200
- Jan 1, 2022
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Extracting buildings from remote sensing images is an important task with a variety of applications. Considerable attention has focused on achieving new SOTA accuracy with more and more advanced deep learning models. However, the developed models still hardly generalize across geographical areas, hindering the practical use of SOTA approaches. To attack this problem, we established a baseline for model cross-area generalization ability using available datasets for BE. In addition to two popular FCN-based models, we first adapted two novel transformer-based models, Swin Transformer and SegFormer, which are all able to output SOTA accuracy with no big difference when tested within one area. However, experimental results show that all models fail to generalize to a different area. We then propose to fine-tune pre-trained models from one area on a small subset of an unseen area, the effectiveness of which depends on the model choice and the data size for tuning. By jointly taking advantage of the transfer learning idea and the multiscale feature learning ability of SegFormer, a distinct improvement has been achieved compared to results from Swin Transformer and FCN-based models trained on the same amount of data. Commonly used metric, IoU, can be increased from 38.97% to 70.86%, and from 48.36% to 74.51%, when using 10% and 30% subset of the targeting area, respectively. The influence of model choice and data size for tuning has also been investigated. Our work contributes to complementing the algorithm development and within-area model evaluation in the hot field of BE from RS images.