Investigating class granularity in historical aerial image segmentation: a comparative analysis of CNN and transformer-based models
ABSTRACT Historical aerial imagery acts as a crucial bridge to the pre-satellite era, becoming increasingly vital for quantifying long-term land cover changes and supporting retrospective environmental studies. This study presents a comprehensive comparison of CNN-based (DeepLabV3+, PSPNet, HRNet, ConvNeXt) and transformer-based (ViT, Swin Transformer, BEiT) architectures for semantic segmentation of historical aerial imagery. In particular, the potential of Transformer-based architectures for semantic segmentation of panchromatic historical aerial imagery in the context of increasing segmentation complexity remains underexplored. We assess how effectively each architecture handles increasing class granularity and captures finer-grained land-cover semantics. Experiments are conducted at three levels of detail: binary, three-class, and five-class segmentation. The dataset comprises panchromatic images from 1960, captured in Turkey. Precision, recall, F1-score, and intersection-over-union (IoU) metrics were employed to evaluate class-wise performance. In the binary ground and non-ground segmentation task, all architectures achieved mean F1-scores of 0.945–0.959 and mean IoU values of 0.905–0.921. In the three-class and five-class tasks, mean F1-scores ranged from 0.861–0.902 and 0.754–0.803, with mean IoU values of 0.777–0.831 and 0.623–0.682, respectively. In the three-class setting, ConvNeXt, Swin Transformer, and BEiT clustered towards the upper end of the mean metrics with only small differences among them. Increasing class granularity from three to five reduced mean performance across architectures, with decreases of up to 0.123 in mean F1-score and up to 0.176 in mean IoU. Across all classification levels, ConvNeXt consistently achieved higher performance, distinguishing itself specifically in the five-class task with the highest F1-scores and IoU values. ConvNeXt outperformed both the CNN-based baselines and Transformer-based models in the challenging building and road categories.
- Research Article
12
- 10.1016/j.rsase.2021.100477
- Feb 6, 2021
- Remote Sensing Applications: Society and Environment
Assessing long-term land cover changes in watershed by spatiotemporal fusion of classifications based on probability propagation: The case of Dniester river basin
- Research Article
45
- 10.3390/su12083331
- Apr 20, 2020
- Sustainability
The loss of farmland to urban use in peri-urban areas is a global phenomenon. Urban sprawl generates a decline in the availability of productive agricultural land around cities, causing versatile conflicts between nature and society and threatening the sustainability of urban agglomerations. This study aimed to uncover the spatial pattern of long-term (80 years) land cover changes in the functional urban area of Budapest, with special attention to the conversion of agricultural land. The paper is based on a unique methodology utilizing various data sources such as military-surveyed topographic maps from the 1950s, the CLC 90 from 1990, and the Urban Atlas from 2012. In addition, the multilayer perceptron (MLP) method was used to model land cover changes through 2040. The research findings showed that land conversion and the shrinkage of productive agricultural land around Budapest significantly intensified after the collapse of communism. The conversion of arable land to artificial surfaces increased, and by now, the traditional metropolitan food supply area around Budapest has nearly disappeared. The extent of forests and grasslands increased in the postsocialist period due to national afforestation programs and the demand of new suburbanites for recreational space. Urban sprawl and the conversion of agricultural land should be an essential issue during the upcoming E.U. Common Agricultural Policy (CAP) reforms.
- Research Article
6
- 10.3389/fenvs.2024.1320009
- Jan 22, 2024
- Frontiers in Environmental Science
Introduction: Highlighting and assessing land cover changes in a heterogeneous landscape, such as those with surface mining activities, allows for understanding the dynamics and status of the analyzed area. This paper focuses on the long-term land cover changes in the Jiului Valley, the largest mining basin in Romania, using Landsat temporal image series from 1988 to 2017.Methods: The images were classified using the supervised Support Vector Machine (SVM) algorithm incorporating four kernel functions and two common algorithms (Maximum Likelihood Classification - MLC) and (Minimum Distance - MD). Seven major land cover classes have been identified: forest, pasture, agricultural land, built-up areas, mined areas, dump sites, and water bodies. The accuracy of every classification algorithm was evaluated through independent validation, and the differences in accuracy were subsequently analyzed. Using the best-performing SVM-RBF algorithm, classified maps of the study area were developed and used for assessing land cover changes by post-classification comparison (PCC).Results and discussions: All three algorithms displayed an overall accuracy, ranging from 76.56% to 90.68%. The SVM algorithms outperformed MLC by 4.87%–8.80% and MD by 6.82%–10.67%. During the studied period, changes occurred within analyzed classes, both directly and indirectly: forest, built-up areas, mined areas, and water bodies experienced increases, whereas pasture, agricultural land, and dump areas saw declines. The most notable changes between 1988 and 2017 were observed in built-up and dump areas: the built-up areas increased by 110.7%, while the dump sites decreased by 53.0%. The mined class showed an average growth of 6.5%. By highlighting and mapping long-term land cover changes in this area, along with their underlying causes, it became possible to analyze the impact of land management and usage on sustainable development and conservation effort over time.
- Research Article
1
- 10.1007/s00261-025-04887-y
- Apr 1, 2025
- Abdominal radiology (New York)
To train and validate segmentation models for automated segmentation of gallbladder cancer (GBC) lesions from contrast-enhanced CT images. This retrospective study comprised consecutive patients with pathologically proven treatment naïve GBC who underwent a contrast-enhanced CT scan at four different tertiary care referral hospitals. The training and validation cohort comprised CT scans of 317 patients (center 1). The internal test cohort comprised a temporally independent cohort (n = 29) from center 1 (internal test 1). The external test cohort comprised CT scans from three centers [ (n = 85)]. We trained the state-of-the-art 2D and 3D image segmentation models, SAM Adapter, MedSAM, 3D TransUNet, SAM-Med3D, and 3D-nnU-Net, for automated segmentation of the GBC. The models' performance for GBC segmentation on the test datasets was assessed via dice score and intersection over union (IoU) using manual segmentation as the reference standard. The 2D models performed better than 3D models. Overall, MedSAM achieved the highest dice and IoU scores on both the internal [mean dice (SD) 0.776 (0.106) and mean IoU 0.653 (0.133)] and external [mean dice (SD) 0.763 (0.098) and mean IoU 0.637 (0.116)] test sets. Among the 3D models, TransUNet showed the best segmentation performance with mean dice (SD) and IoU (SD) of 0.479 (0.268) and 0.356 (0.235) in the internal test and 0.409 (0.339) and 0.317 (0.283) in the external test sets. The segmentation performance was not associated with GBC morphology. There was weak correlation between the dice/IoU and the size of the GBC lesions for any segmentation model. We trained 2D and 3D GBC segmentation models on a large dataset and validated these models on external datasets. MedSAM, a 2D prompt-based foundational model, achieved the best segmentation performance.
- Research Article
73
- 10.1016/j.scitotenv.2019.134206
- Aug 30, 2019
- Science of The Total Environment
Long-term land cover change in Zambia: An assessment of driving factors
- Research Article
9
- 10.3390/land10070708
- Jul 5, 2021
- Land
After the Korean War, human access to the Korean Demilitarized Zone (DMZ) was highly restricted. However, limited agricultural activity was allowed in the Civilian Control Zone (CCZ) surrounding the DMZ. In this study, land cover and vegetation changes in the western DMZ and CCZ from 1919 to 2017 were investigated. Coniferous forests were nearly completely destroyed during the war and were then converted to deciduous forests by ecological succession. Plains in the DMZ and CCZ areas showed different patterns of land cover changes. In the DMZ, pre-war rice paddies were gradually transformed into grasslands. These grasslands have not returned to forest, and this may be explained by wildfires set for military purposes or hydrological fluctuations in floodplains. Grasslands near the floodplains in the DMZ are highly valued for conservation as a rare land type. Most grasslands in the CCZ were converted back to rice paddies, consistent with their previous use. After the 1990s, ginseng cultivation in the CCZ increased. In addition, the landscape changes in the Korean DMZ and CCZ were affected by political circumstances between South and North Korea. Our results provide baseline information for the development of ecosystem management and conservation plans for the Korean DMZ and CCZ. Given the high biodiversity and ecological integrity of the Korean DMZ region, transboundary governance for conservation should be designed.
- Research Article
96
- 10.1016/j.rse.2021.112822
- Dec 8, 2021
- Remote Sensing of Environment
Monthly mapping of forest harvesting using dense time series Sentinel-1 SAR imagery and deep learning
- Research Article
17
- 10.1016/j.ecolind.2020.106904
- Sep 9, 2020
- Ecological Indicators
Long-term forest cover and height changes on abandoned agricultural land: An assessment based on historical stereometric images and airborne laser scanning data
- Preprint Article
10
- 10.5194/egusphere-egu23-16205
- May 15, 2023
The Satellite Application Facility on Support to Operational Hydrology and Water Management (H SAF) is providing surface soil moisture data record products based on a change detection technique applied to the Advanced Scatterometer (ASCAT) on-board the series of Metop satellites. At the moment two of the three Metop satellites are still operational (Metop-B and Metop-C), while the first satellite (Metop-A), launched in 2007, completed its mission in November 2021. Thus, the latest ASCAT surface soil moisture data record product covers a period of more than 15 years (2007-2022).First analysis of long-term trends in the H SAF ASCAT surface soil moisture data record product have indicated strong anomalies for specific regions around the globe. Trend similarities have been found compared to other data sets such as soil moisture information provided by the ERA5 land surface model. However, certain soil moisture anomaly pattern did not match spatially or in their trend direction. It has been observed that land cover changes contribute to the overall ASCAT backscatter signal with a noticeable impact on the retrieved soil moisture information especially over longer time periods (>10 years). Most notably are areas with slowly changing ground conditions such as growing cities or regions suffering deforestation.In this study we want to present a new method to mitigate the effects of long-term land cover changes based on a regular re-calibration of the dry and wet backscatter reference. It is important to address and remove this non-climatic effects from the surface soil moisture data record products to correctly detect and monitor climate extremes.
- Research Article
7
- 10.1007/s10661-020-8136-2
- Feb 10, 2020
- Environmental Monitoring and Assessment
Multi-date remotely sensed images comprising Landsat TM images of 1984, 1993 and 2003 and, Landsat OLI images of 2013 were used to reconstruct long-term changes in land cover in the Swartkops River Estuary by mapping changes in vegetation distribution over a period of ~ 30years between 1984 and 2013. These images were complemented by high-resolution near-anniversary aerial photographs that were used as ancillary sources of ground truth during supervised classification of the Landsat images. Results of our investigation point to human-induced loss of biodiversity due to persistent encroachment of different development activities on terrestrial vegetation, substantial expansion of the salt marsh due to climate change-driven relative sea level rise and persistent increase in keystone salt marsh vegetation species notably Zostera capensis and Spartina maritima due to the combined influence of human-induced nutrient loading into estuarine water and relative sea level rise. These observations argue for the immediate need to embrace appropriately informed management interventions in order to enhance the sustainability of salt marsh ecosystems for the benefit of present and future generations.
- Research Article
- 10.9734/ajee/2025/v24i4689
- Apr 7, 2025
- Asian Journal of Environment & Ecology
Land use/land cover (LULC) are a critical concern due to their significant impact on ecosystems, biodiversity, and climate patterns. The objective of this study is to understand the dynamics of land use and land cover (LULC) changes and quantify the fragmentation in the Ken River Basin using open-access remote sensing data and the FRAGSTATS software. Landsat images from 1995, 2015, and 2022 were utilized to analyze changes over these distinct time periods. We employed supervised classification using the maximum likelihood method to produce detailed land use and land cover maps. The analysis identified five land use classes: water bodies, forest, barren land, cultivable land, and built-up land, with cultivable land emerging as the most dominant class, followed closely by forest cover. To quantify the land cover classes, various landscape metrics at the class level were employed. The results reveal a concerning trend: both forest and cultivable land classes are experiencing increasing fragmentation over time. This rising fragmentation poses significant risks to the ecological integrity and sustainability of the Ken River Basin. By quantifying long-term land cover changes, this study assesses the effectiveness of conservation efforts and utilizes remote sensing and GIS techniques to inform and enhance best management practices in the region.
- Research Article
2
- 10.1007/s10021-019-00456-9
- Nov 19, 2019
- Ecosystems
Anthropogenic land use affects climate by altering the energy balance of the Earth’s surface. In temperate regions, cooling from increased albedo is a common result of historical land-use change. However, this albedo cooling effect is dependent mainly on the exposure of snow cover following forest canopy removal and may change over time due to simultaneous changes in both land cover and snow cover. In this paper, we combine modern remote sensing data and historical records, incorporating over 100 years of realized land use and climatic change into an empirical assessment of centennial-scale surface forcings in the Upper Midwestern USA. We show that, although increases in surface albedo cooled through strong negative shortwave forcings, those forcings were reduced over time by a combination of forest regrowth and snow-cover loss. Deforestation cooled strongly (− 5.3 Wm−2) and mainly in winter, while composition shift cooled less strongly (− 3.03 Wm−2) and mainly in summer. Combined, changes in albedo due to deforestation, shifts in species composition, and the return of historical forest cover resulted in − 2.81 Wm−2 of regional radiative cooling, 55% less than full deforestation. Forcings due to changing vegetation were further reduced by 0.32 Wm−2 of warming from a shortened snow-covered season and a thinning of seasonal snowpack. Our findings suggest that accounting for long-term changes in land cover and snow cover reduces the estimated cooling impact of deforestation, with implications for long-term land-use planning.
- Research Article
251
- 10.1073/pnas.221053998
- Oct 23, 2001
- Proceedings of the National Academy of Sciences
Declines in habitat and wildlife in semiarid African savannas are widely reported and commonly attributed to agropastoral population growth, livestock impacts, and subsistence cultivation. However, extreme annual and shorter-term variability of rainfall, primary production, vegetation, and populations of grazers make directional trends and causal chains hard to establish in these ecosystems. Here two decades of changes in land cover and wildebeest in the Serengeti-Mara region of East Africa are analyzed in terms of potential drivers (rainfall, human and livestock population growth, socio-economic trends, land tenure, agricultural policies, and markets). The natural experiment research design controls for confounding variables, and our conceptual model and statistical approach integrate natural and social sciences data. The Kenyan part of the ecosystem shows rapid land-cover change and drastic decline for a wide range of wildlife species, but these changes are absent on the Tanzanian side. Temporal climate trends, human population density and growth rates, uptake of small-holder agriculture, and livestock population trends do not differ between the Kenyan and Tanzanian parts of the ecosystem and cannot account for observed changes. Differences in private versus state/communal land tenure, agricultural policy, and market conditions suggest, and spatial correlations confirm, that the major changes in land cover and dominant grazer species numbers are driven primarily by private landowners responding to market opportunities for mechanized agriculture, less by agropastoral population growth, cattle numbers, or small-holder land use.
- Research Article
- 10.1002/mp.70449
- Apr 24, 2026
- Medical physics
Accurate and real-time localization of thoracic tumor targets is essential for effective radiation therapy. Recently, Transformer architectures have demonstrated strong global reasoning capabilities across multiple frames by leveraging both self-attention and cross-attention mechanisms. Transformers have therefore been applied to object tracking with great success. By combining Image Guided Radiation Therapy (IGRT) technologies and deep learning-based object tracking architecture, it is possible to deliver radiation doses to the target area with high accuracy. This study develops a transformer-based patient-agnostic tracking model (TransTracking) for surface and markerless internal target tracking in thoracic tumor radiotherapy. We trained the TransTracking model using the training splits of publicly available object tracking datasets. Subsequently, for internal target tracking, the model is fine-tuned using 10,000 digitally reconstructed radiograph (DRR) images generated from the actual 4DCT datasets of 25 patients. The DRR images are annotated with bounding boxes of the moving tumor. Our method learns to directly predict the target classification and bounding-box regression weights through end-to-end training, enabling accurate target localization in each frame for both surface and internal target tracking sequences. The tracking performance of the trained model was evaluated in 20 volunteers for surface tracking and using DRR images generated from 20 4DCT datasets for internal tumor tracking. To address the limited availability of medical images for training, we conducted the data augmentation procedure to 4DCT datasets and expanded the data scale 40-fold in total. For the surface marker tracking, the mean absolute deviation (MAD)±standard deviation (SD) between the model-predicted and the actual positions for 20 volunteers was 0.07±0.06mm, 0.12±0.13mm, and 0.29±0.20mm in left-right, superior-inferior, and anterior-posterior directions, respectively. In each directional axis, over 85% of frames exhibited a model-predicted target position within 0.5mm of the corresponding ground-truth position. For the internal tumor tracking, the MAD±SD between the predicted and annotated center positions of the tumor bounding boxes is 1.49±1.39mm, with a mean Intersection over Union (IoU) value of 0.83 and an area under curve (AUC) score of 82% for 20 patients. Additionally, our transformer-based model can extract the target position in 81ms after an image is acquired. This study proposed a novel Transformer-based deep learning method aimed at training a patient-agnostic tumor motion tracking model in radiotherapy. The model enables real-time, high-precision tracking of surface markers using vision cameras, offering a cost-effective and compact solution. Additionally, we demonstrated that our method can accurately locate tumor target areas in DRR images with high precision, without the need for individualized training or the implantation of fiducial markers. This feasibility study demonstrates the strong potential of our strategy as a clinically viable solution for moving tumor IGRT.
- Book Chapter
2
- 10.1007/978-981-19-7892-0_24
- Jan 1, 2023
This paper investigates whether deep learning architectures for semantic segmentation are capable of supporting geneticists in karyotype exporting, in a more efficient manner without requiring the intervention of humans. For the sake of experiments, 62 images from the BioImLab segmentation dataset have been adopted that contain chromosomes, nucleotides, and some unknown objects. All regions of interest had been annotated manually with an emphasis on the overlapping areas between chromosomes. For this purpose, we created 10 synthetic folds, using the Holdout Cross Validation between 10 selected targeted microscope images containing all classes. The newly designed dataset is used to train 5 deep learning CNN with pretrained weights using the transfer learning technique, in order to highlight the strengths and the weaknesses of each architecture in the segmentation of “Overlapping” regions. In terms of evaluation, the metric of IoU (intersection over union) is used, which is widely used and approved in cases of the existence of overlapping between objects. The best result was 66.67% IoU in the case of Vgg19 model combined with U-Net achieving 57.1% mean IoU. The future prospects of this study are to assist the cytogeneticists to (a) remove the objects of no interest from the microscope image, (b) evaluate the suitability of the microscopic images for karyotyping, and (c) automate the karyotyping process.