- New
- Research Article
- 10.1080/10095020.2025.2611521
- Mar 12, 2026
- Geo-spatial Information Science
- Bofeng Li + 3 more
ABSTRACT Extra-wide-lane real-time kinematic (ERTK) is a technique that makes full use of extra-wide-lane (EWL) observations to realize instantaneous precise positioning. Beyond the previous study by using triple-frequency signals, the hexa- and penta-frequency signals, referred to as hyper-frequency signals in this study, are currently available for Beidou-3 and Galileo systems, respectively, which will be definitely beneficial to ERTK. In this study, the advantages and performance of hyper-frequency ERTK (HERTK) are profoundly addressed. The mathematical model of generalized HERTK is deduced with canonical formulae to show how model parameters profit from additional signals and high-precision EWL/WL observations. Specifically, the optimal linear combinations of hyper-frequency signals are determined in terms of ionosphere-weighted and ionosphere-float models. The precision gains of both position and ambiguity parameters are numerically demonstrated for single- and multi-epoch, accompanied by a comprehensible explanation of the hyper-frequency enhancement mechanism. The performance of HERTK is evaluated with three long baselines from 248.4 to 511.0 km. The results show that the HERTK achieves instantaneous decimeter-level solutions without the need for complicated narrow-lane (NL) ambiguity resolution (AR). Furthermore, centimeter HERTK can be realized by only accumulating NL phase data over approximately 20 epochs, which essentially leverages the more precise between-epoch information to smooth the noisy solutions. Besides the smoothed positions, the precision of NL ambiguity is also significantly improved, thus enabling rapid and reliable NL AR for long baselines. Higher accuracy of 1–2 cm solutions is achieved within 10–30 epochs.
- New
- Research Article
- 10.1080/10095020.2026.2628441
- Mar 6, 2026
- Geo-spatial Information Science
- Jichong Yin + 5 more
ABSTRACT A large-capacity and high-quality building extraction dataset is the basis for intelligently extracting buildings from large-scale heterogeneous images. A deep learning-based automatic building alignment and correction model is proposed to address the difficulty of updating building extraction dataset derived from remote sensing images. An intelligent update method for the building extraction dataset is implemented based on this model. The model first uses a multiscale fully convolutional encoder-decoder architecture to match building instances between map data and remote sensing images and obtains classification labels, including similar, redundant, and missing labels; then, the model implements pairing via an instance transformation network. A spatial transformation is performed on similar labels to obtain aligned building labeling results. Finally, the redundant labels are deleted, the missing labels obtained from the segmentation process are regularized to achieve instance optimization, and corrected building labeling results are obtained. Experiments show that the automatic alignment and correction method proposed in this paper for building labeling can effectively improve the accuracy of building labeling and can significantly improve the efficiency of dataset updates while satisfying dataset quality requirements.
- New
- Research Article
- 10.1080/10095020.2026.2615564
- Mar 2, 2026
- Geo-spatial Information Science
- Saifei Tu + 4 more
ABSTRACT High-resolution (HR) land-cover mapping is an important task for surveying the Earth’s surface and supporting decision-making in sectors such as agriculture, forestry and smart cities. However, it is impeded by the scarcity of HR high-quality labels, complex ground details and high computational cost. To address these challenges, we propose VCNet, a weakly supervised end-to-end deep learning network for large-scale HR land-cover mapping. It leverages easy-access low-resolution (LR) land-cover products as the sole guidance of supervision, fully eliminating the need for manual annotation. In VCNet, we propose a cross-feature learning backbone to learn complete details of various land objects for fine-scale land cover mapping. Besides, it is hybridized with a high-resolution maintaining module and label refining strategies to constantly refine coarse LR labels for guiding the framework training. Extensive experiments in the Chesapeake Bay dataset demonstrate the superiority of VCNet in generating HR land-cover maps from LR labels. Furthermore, we constructed the Tokyo dataset to analyze VCNet’s sensitivity to different LR labels. To verify its practical application potential, VCNet was utilized to produce a 1 m resolution land-cover map for Shanghai (China’s economic epicenter) from a lower resolution (10-m) product, greatly enriching complex ground details. Besides, due to the importance of transportation networks for highly urbanized region, we introduced road category in the practical mapping of Shanghai, which fills a critical gap in traditional land cover classification systems. This contribution offers a scalable solution for evidence-based decision-making in comparable developed regions. Our code is available at: https://github.com/Tusaifei/VCNet.
- New
- Research Article
- 10.1080/10095020.2026.2626665
- Mar 2, 2026
- Geo-spatial Information Science
- Chen Long + 5 more
ABSTRACT Diameter at Breast Height (DBH) and Tree Height (TH) are key structure parameters for monitoring roadside trees. Traditional field surveys and LiDAR scanning are either inefficient or expensive. Therefore, we propose an innovative method to compute tree structure parameters using low-cost, high-coverage street-view images. Existing image-based methods often rely on fixed scale priors (e.g. fixed camera height) or require manual interpretation, which results in poor generalization, low accuracy, and inefficiency. Inspired by how humans understand the 3D world, we integrate semantic and geometric cues to overcome these challenges. Specifically, we propose the first end-to-end tree structure parameter computation network, named TSC-Net. It makes several contributions: (1) To extract robust semantic and geometry information, we integrate a decoupled dual-branch feature encoder. It strengthens the multimodal information extraction capability through a separated dual-path encoding structure. (2) We design a Multimodal Cue-collaborative Guided Regression Module (MCGRM). The core innovation is that it introduces two auxiliary tasks (i.e. distance regression and tree mask regression), which guide the network to focus on the core semantic and geometric cues related to this tree measurement task. Finally, we develop a new dataset for evaluation, TSC-Net achieves Normalized Root Mean Square Error (NRMSE) of 0.20 for DBH and 0.15 for TH, significantly outperforming existing comparative methods (0.44 and 0.24, respectively). TSC-Net also reduces measurement time from 0.67 h to 0.143 s, offering an efficient solution for roadside tree monitoring.
- New
- Research Article
- 10.1080/10095020.2026.2619233
- Mar 2, 2026
- Geo-spatial Information Science
- Aokun Liang + 7 more
ABSTRACT Remote sensing vision – language models (RSVLMs) have made notable progress in bridging the semantic gap between satellite imagery and natural language. However, two fundamental limitations persist. First, existing vision – language corpora for remote sensing are constructed using general-purpose models and lack integration of structured geographic priors from authoritative remote sensing resources. Second, current RSVLMs do not explicitly model geometric boundaries or spatial relations, leading to suboptimal image – text alignment. To address these limitations, we introduce GeoPrior, a large-scale tri-modal dataset comprising 828k satellite images, 2.48 M textual descriptions, and aligned rasterized maps. GeoPrior encodes detailed geographic priors – including geometric structures, topological relationships, and semantic attributes – extracted from authoritative vector maps. These priors guide GPT-4o in generating knowledge-rich captions tailored to remote sensing, while the rasterized maps provide an additional modality that captures fine-grained boundary information beyond the expressiveness of natural language. Building upon GeoPrior, we propose GeoPriorCLIP, a vision – language model tailored for remote sensing. Our key technical contribution is the geo-aware cross-modal attention module, which injects map-derived spatial priors into the CLIP image encoder to enhance visual representation with explicit geometric and topological awareness. Extensive experiments on 18 public benchmark datasets across four tasks – zero-shot classification, cross-modal retrieval, semantic localization, and zero-shot semantic segmentation – demonstrate that GeoPriorCLIP consistently outperforms state-of-the-art RSVLMs. The code and data will be released upon acceptance.
- New
- Research Article
- 10.1080/10095020.2026.2624287
- Mar 2, 2026
- Geo-spatial Information Science
- Yutao Liu + 6 more
ABSTRACT Machine learning is widely employed in landslide susceptibility assessment (LSA). To address the issues of weak classification ability of individual machine learning models and unknown effectiveness of various machine learning models in different study areas, we propose a three-level stacking ensemble strategy for LSA. This study focuses on the eastern part of Enshi City, Hubei Province, with 676 historical landslide points within the area as samples. Twelve landslide conditioning factors (LCFs) related to terrain, geology, hydrology, and human activities were selected. Models were constructed using seven basic machine learning classifiers to obtain the landslide susceptibility index of the study area and to generate landslide susceptibility maps. Subsequently, heterogeneous ensemble strategies, including voting, stacking, and three-level stacking (3LStacking), were applied to update the maps. We utilized the area under the receiver operating characteristic (ROC) curve (AUC) and mean squared error (MSE) as evaluation metrics. The results indicate that the heterogeneous ensemble strategy outperforms the basic classifiers. Among them, the proposed method achieved the highest accuracy, with an AUC value of 0.950 and an MSE of 0.058. This suggests that the 3LStacking significantly enhances the performance of machine learning modeling and is a reliable method for LSA. The findings of this study will contribute to improving the accuracy of regional LSA. Additionally, through an analysis of the importance of LCFs in the basic classifiers, it was found that distance to roads and elevation are critical triggering factors for landslides in the study area, while slope, distance to streams, and distance to faults also have different degrees of influence on landslide development.
- New
- Research Article
- 10.1080/10095020.2026.2628435
- Mar 1, 2026
- Geo-spatial Information Science
- Hengtong Shen + 4 more
ABSTRACT Self-supervised learning (SSL) facilitates the pre-training of foundation models without reliance on costly labeled data. Among SSL methods, contrastive learning (CL) excels at extracting robust semantic representations, even in the presence of complex interference. However, despite the success of CL in general computer vision, the significant domain gap necessitates specific adaptations for remote sensing imagery. To this end, we present a novel self-supervised method called PerA, which produces all-purpose remote sensing features through semantically perfectly aligned sample pairs. Specifically, PerA obtains features from sampled views by applying spatially disjoint masks to augmented images rather than random cropping. Our framework learns high-quality representations by ensuring consistency between teacher and student networks and predicting learnable mask tokens. Compared to previous contrastive methods, our method demonstrates higher memory efficiency and supports larger batch sizes due to the processing of sparse inputs. Additionally, the proposed method exhibits remarkable adaptability to uncurated remote sensing data and mitigates the impact of the potential semantic inconsistency. We also collected an unlabeled pre-training dataset, which contains about 5 million remote sensing images. We conducted experiments on multiple downstream task datasets and achieved performance comparable to previous state-of-the-art methods with a limited model scale, validating the effectiveness of our approach. We hope this work will contribute to practical remote sensing image interpretation.
- New
- Research Article
- 10.1080/10095020.2026.2617842
- Mar 1, 2026
- Geo-spatial Information Science
- Guanhao Zhang + 6 more
ABSTRACT Thermal infrared remote sensing retrieval is a unique way to acquire large-scale and high-precision land surface temperature (LST), but is often affected by clouds and fog, leading to noticeable spatial gaps. In this article, to address this limitation, we propose a comprehensive framework that integrates satellite and model-simulated data to generate 1 km all-weather LST four times per day across China. Specifically, the framework comprises three key modules. First, we present a data quality optimization preprocessing approach that combines quality flags with morphological processing to balance the quantity and accuracy of satellite LST observations. To address the missing data, a two-stage spatiotemporal fusion method is developed that leverages the complementarity of time-series satellite and model-simulated LST. Additionally, a local-global cascade correction postprocessing strategy is designed to progressively refine the reconstructed results, ultimately achieving stable gapless LST. Upon validation and analysis, the proposed all-weather LST demonstrates a better performance compared to the existing large-scale LST datasets, exhibiting an advantage in mean absolute error ranging from 0.15 K to 0.83 K, verified by in-situ LST. Furthermore, the proposed LST product is consistent with objective geophysical principles and historical meteorological records, which can be anticipated to support research areas such as agriculture and climate change analysis.
- New
- Research Article
- 10.1080/10095020.2026.2627100
- Mar 1, 2026
- Geo-spatial Information Science
- Qinhan Zhang + 6 more
ABSTRACT In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have achieved significant progress in Hyperspectral Image (HSI) classification. However, in practical applications, the high cost of sample annotation and the limited availability of training samples lead to overfitting in CNNs and ViTs under few-shot learning scenarios. Siamese networks, as an effective metric learning method, show promising performance in few-shot learning due to their low dependency on sample information. However, traditional siamese networks rely on static parameter-sharing mechanisms, lack feature interaction between the two subnetworks, and struggle to effectively capture the spatial-spectral heterogeneity in hyperspectral data. Additionally, they are prone to noise interference, resulting in insufficient discriminative power of key features. To address these challenges, this paper proposes a Contextual Interaction Siamese Network for Few-Shot Hyperspectral Image Classification (CISNet). First, an Interactive Feature Fusion Module (IFFM) is introduced to capture the similarities and differences between features from the two subnetworks, thereby enhancing the discriminative power of key features. Second, an Enhanced Token Generation Module (ETGM) is designed to generate correlated class tokens for the two subnetworks. Finally, this paper innovatively proposes a Context Interaction Transformer Block (CITB) and a Guided Attention (GA) mechanism to strengthen global context interaction between the two subnetworks. Extensive experiments demonstrate that CISNet achieves superior performance under few-shot conditions and outperforms other state-of-the-art methods in classification accuracy.
- New
- Research Article
- 10.1080/10095020.2026.2615485
- Mar 1, 2026
- Geo-spatial Information Science
- Sihan Zhou + 6 more
ABSTRACT Real-time processing capability is a key enabler for the next generation of intelligent satellites. Full-waveform spaceborne laser altimetry has become an indispensable tool in a wide range of scientific and engineering applications, including terrain mapping, biomass estimation, and Earth system monitoring. Within this context, accurate and efficient laser footprint localization is essential for ensuring the geometric reliability of altimetric measurements. However, conventional waveform-matching methods suffer from severe computational burdens, rendering them unsuitable for high-frequency on-orbit calibration and difficult to deploy in real-time onboard scenarios. To address these challenges, this paper proposes SLA-FLNet – a physics-guided deep learning framework that integrates key physical mechanisms of laser pulse propagation, terrain modulation, and echo formation through a multi-branch spatiotemporal architecture. Each module of SLA-FLNet explicitly encodes a physically interpretable process, enabling accurate, interpretable, and scalable footprint localization. To support supervised training in the absence of ground-truth labels, pseudo-labels were generated using a classical waveform-matching algorithm. The model was evaluated on 2379 laser footprints from 18 beams of the GaoFen-7 (GF-7) satellite, spanning 12 U.S. states with diverse terrain. SLA-FLNet achieved high prediction accuracy and delivered footprint localization results consistent with waveform matching, even in unseen geographic regions. An ablation study further highlighted the critical role of terrain-encoding in enhancing structural fidelity and cross-regional generalization. Compared to traditional methods, SLA-FLNet achieved over 100,000 × inference speedup on modern GPUs, demonstrating strong potential for real-time onboard processing. In summary, SLA-FLNet provides a physically consistent, computationally efficient, and deployment-ready solution for full-waveform footprint localization and on-orbit calibration, supporting future autonomous Earth observation missions.