SIGLNet: a semantic information-guided lightweight stereo matching network for satellite imagery

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Aerospace photogrammetry serves as a critical methodology for large-scale 3D reconstruction, where disparity estimation constitutes the fundamental technology. With various types of complex scenes covered in high-resolution satellite images, traditional stereo matching methods are facing a great challenge. Although deep learning-based disparity estimation networks have gained significant traction in recent years, research within satellite image stereo matching predominantly prioritizes model accuracy while neglecting computational efficiency and model complexity, which runs counter to the general trend of onboard intelligent processing in modern remote sensing satellites. To bridge this gap, we propose SIGLNet, a semantic information-guided lightweight end-to-end stereo matching network specifically designed for high-resolution satellite stereo imagery. The SIGLNet first employs a lightweight backbone to extract multi-scale deep features, which are fused into the downsampled low-resolution feature maps. Utilizing these representations, we then construct a low-resolution cost volume and deploy efficient cost aggregation via inverted residual blocks and multi-branch adjustable bottleneck modules, producing an initial low-resolution disparity map. Finally, a semantic information-guided disparity upsampling module reconstructs high-quality full-resolution disparity maps. Extensive experiments conducted on the US3D and the WHUStereo benchmark demonstrate that the proposed SIGLNet achieves the highest disparity estimation accuracy while maintaining competitive model complexity compared to widely used methods.

Similar Papers
  • Conference Article
  • Cite Count Icon 7
  • 10.1117/12.453690
<title>Semi-automatic road extraction from IKONOS satellite image</title>
  • Jan 25, 2002
  • Taehun Yoon + 2 more

A semi-automatic road extraction method from high-resolution (1-m) satellite images is presented. As IKONOS, a high-resolution (1-m) satellite has been launched and several companies have plans to launch high-resolution satellites, extraction of man-made objects from high-resolution satellite images has been main concern of many scientists. The method consists of three phases; 1) NUBS (Non Uniform B-Spline) curve is formed by given seed points. 2) A road candidate area is made by straightening image along the NUBS curve. 3) Finally, road is extracted by a tracking algorithm which uses adaptive least squares correlation match method and linearity. Due to straightening image, the tracking algorithm extracts roads accurately even though there are road gaps, and the size of a matrix for least squares correlation match can be reduced. We test our method on high-resolution (1-m) satellite (IKONOS) image. The test result reveals our method is robust and can be one of the feasible solutions of mapping from high-resolution (1-m) satellite images.

  • Research Article
  • 10.5194/isprs-archives-xlviii-1-w6-2025-205-2025
Stereo Matching and Digital Surface Model Generation for Satellite Imagery: From Scanline Aggregation to Deep Learning with RAFTStereo
  • Dec 31, 2025
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Yazgı Nur Sayın + 1 more

Abstract. Digital Surface Model (DSM) generation from satellite stereo images is one of the key applications in both computer vision and photogrammetry. Recent progress in high-resolution satellite imaging and deep learning has favoured their applications for accuracy enhancement and automation in DSM generation. However, complex acquisition geometries from satellite imaging, regions of repetitive textures or patterns, and varying atmospheric conditions continue to complicate the process of dense stereo matching. This study presents a detailed framework for DSM generation: image preprocessing, epipolar rectification, disparity estimation, and 3D reconstruction. At the image matching stage of stereo images, it compares the traditional methods such as Semi-Global Matching (SGM) and More Global Matching (MGM) with a deep learning-based approach-RAFTStereo. Experimental results with WorldView-3 satellite stereo pairs of the Data Fusion Contest 2019 (DFC2019) dataset show that while SGM and MGM remain robust and computationally efficient, RAFTStereo performs better especially on radiometrically and geometrically complex scenes. MGM provides numerical errors at the lowest values ≈2–4 meters, while RAFTStereo offers more coherent disparity maps, with more smooth surfaces, and fewer artifacts. These results also point out the complementary nature of traditional approaches and learning-based methodologies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1371/journal.pone.0251657
A stereo matching algorithm based on the improved PSMNet.
  • Aug 19, 2021
  • PLOS ONE
  • Zedong Huang + 3 more

Deep learning based on a convolutional neural network (CNN) has been successfully applied to stereo matching. Compared with the traditional method, the speed and accuracy of this method have been greatly improved. However, the existing stereo matching framework based on a CNN often encounters two problems. First, the existing stereo matching network has many parameters, which leads to the matching running time being too long. Second, the disparity estimation is inadequate in some regions where reflections, repeated textures, and fine structures may lead to ill-posed problems. Through the lightweight improvement of the PSMNet (Pyramid Stereo Matching Network) model, the common matching effect of ill-conditioned areas such as repeated texture areas and weak texture areas is solved. In the feature extraction part, ResNeXt is introduced to learn unitary feature extraction, and the ASPP (Atrous Spatial Pyramid Pooling) module is trained to extract multiscale spatial feature information. The feature fusion module is designed to effectively fuse the feature information of different scales to construct the matching cost volume. The improved 3D CNN uses the stacked encoding and decoding structure to further regularize the matching cost volume and obtain the corresponding relationship between feature points under different parallax conditions. Finally, the disparity map is obtained by a regression. We evaluate our method on the Scene Flow, KITTI 2012, and KITTI 2015 stereo datasets. The experiments show that the proposed stereo matching network achieves a comparable prediction accuracy and much faster running speed compared with PSMNet.

  • Research Article
  • Cite Count Icon 14
  • 10.1109/tgrs.2021.3058144
Double Propagation Stereo Matching for Urban 3-D Reconstruction From Satellite Imagery
  • Jan 1, 2022
  • IEEE Transactions on Geoscience and Remote Sensing
  • Li Zhao + 3 more

Stereo matching is one of the popular methods to acquire the depth maps in 3-D reconstruction due to its high accuracy and completeness. However, traditional stereo matching methods are only suitable for general scenes or aerial images instead of urban scenes from large-scale satellite imageries. This article presents a novel double propagation stereo matching (DPSM) for urban 3-D reconstruction from satellite images. To make full use of the geometrical properties of man-made buildings, stereo-rectified satellite images, as input, are initialized to multiple superpixels with geometric models. Then, three similarity metrics are combined to calculate the weighted matching cost. A novel double propagation optimization is developed to optimize iteratively the weighted matching cost under the constraints of region boundaries and geometric models, and the disparity maps can be obtained by minimizing the energy function. Consequently, the corresponding dense 3-D point clouds and digital surface models (DSMs) are calculated by triangulation. Qualitative and quantitative experiments on stereo images captured by Pleiades and WorldView-2 satellites show that the proposed algorithm outperforms the most state-of-the-art stereo matching algorithms in terms of preserving depth discontinuous areas and restoring occlusions.

  • Research Article
  • Cite Count Icon 1
  • 10.11873/j.issn.1004-0323.2005.2.228
Extraction Method of Tree Crown Using High-Resolution Satellite Image
  • Nov 14, 2011
  • Remote Sensing Technology and Application
  • Li Zeng-Yuan Qin Xian-Lin

Crown is a key part of tree. It's the main place that photosynthesis has been taken place. It's also the important energy sources that tree need to grow. So, some researchers often study how to monitor growth of tree, predict trees' life increment and judge the quality of wood, and so on by using remote sensing technique. The appearance of commercial high-resolution satellite data supplies new resources for people to study crown structure of a tree by using remote sensing technique. In this paper, method to detecting tree crown by using QuickBird image that covers the demonstration has been studied. Base on image process, Object-Oriented image analysis method has been taken. The tree crown has been effectively extracted from the QuickBird image by using Fuzzy Classification method that bases on samples. At the same time, it can give some experiment to process high spatial resolution satellite image and make a strong base for promoting the application of high-resolution satellite image in forestry and entironment construction of our country.

  • Conference Article
  • 10.1117/12.872297
Super-resolution rendering for Digital Earth applications
  • Sep 26, 2009
  • Min Li + 4 more

Digital Earth rendering applications, such as Google Earth and World Wind, allow us to explore real information of the Earth surface. To show the diverse details of the Earth surface, it requires high resolution satellite images. In some cases obtaining high resolution satellite images cost too much and sometimes there are even no such images for the interesting sites. In this paper, we present a method using example-based super-resolution techniques combined with image analogies framework to improve the visual quality of satellite images. Detailed high resolution and low resolution satellite images of the same site are regarded as example pairs to form a super-resolution filter. The filter effectively improves resolution of low-resolution satellite images. Moreover, it preserves the coherence of the images and improves the performance of the Digital Earth applications as well. The proposed method has been tested on the World Wind, experiment results show the effectiveness of our method.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.isprsjprs.2024.11.001
Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation
  • Nov 6, 2024
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Nazila Mohammadi + 1 more

Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation

  • Conference Article
  • Cite Count Icon 3
  • 10.1117/12.413917
<title>Roads extraction through texture from aerial and high-resolution satellite images</title>
  • Jan 19, 2001
  • Jose A Malpica + 1 more

There have been many approaches to the extraction of roads. Even though the complete automatic interpretation of aerial or satellite images is still remote, it is possible to obtain sound results from some images under some conditions. In this work we will show the importance of texture and second order statistics in the recognition of roads from satellite and aerial images. Since this type of images is in general registered, the images can be combine with other information from a GIS. In this work vector layers for roads networks are used in combination with raster aerial or satellite images. Several results with high-resolution satellite and aerial images are presented. Shadows and other obstacles caused some mistakes and they present a problem that remains to be tackled. Despite all this, the importance of texture for the extraction of roads is proven. Future work toward a complete automation introducing new information layers from a GIS is also discussed.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ngct.2016.7877514
A cognitive perspective on road network extraction from high resolution satellite images
  • Oct 1, 2016
  • Naveen Chandra + 1 more

Road network in one of the key feature which is used in remote sensing. Many automated methods have been developed in past which are able to detect roads from high resolution satellite (HRS) images. In this study, a cognitive based method is used for detecting road network from HRS images. Cognitive task analysis (CTA) consists of five different stages which are integrated with the nearest neighbour classification approach. This approach is dependent on the manual selection of training data set from HRS images. The method is tested using ten different high resolution satellite images of suburban region. The overall precision and recall computed for test images are 70.24% and 77.18% respectively.

  • Conference Article
  • 10.1117/12.2229509
Stereoscopic depth of field: why we can easily perceive and distinguish the depth of neighboring objects under binocular condition than monocular
  • Jun 1, 2016
  • Kwang-Hoon Lee + 1 more

In this paper, we introduce a high efficient and practical disparity estimation using hierarchical bilateral filtering for realtime view synthesis. The proposed method is based on hierarchical stereo matching with hardware-efficient bilateral filtering. Hardware-efficient bilateral filtering is different from the exact bilateral filter. The purpose of the method is to design an edge-preserving filter that can be efficiently parallelized on hardware. The proposed hierarchical bilateral filtering based disparity estimation is essentially a coarse-to-fine use of stereo matching with bilateral filtering. It works as follows: firstly, the hierarchical image pyramid are constructed; the multi-scale algorithm then starts by applying a local stereo matching to the downsampled images at the coarsest level of the hierarchy. After the local stereo matching, the estimated disparity map is refined with the bilateral filtering. And then the refined disparity map will be adaptively upsampled to the next finer level. The upsampled disparity map used as a prior of the corresponding local stereo matching at the next level, and filtered and so on. The method we propose is essentially a combination of hierarchical stereo matching and hardware-efficient bilateral filtering. As a result, visual comparison using real-world stereoscopic video clips shows that the method gives better results than one of state-of-art methods in terms of robustness and computation time.

  • Conference Article
  • 10.1117/12.2229453
Hierarchical bilateral filtering based disparity estimation for view synthesis
  • Jun 1, 2016
  • Hong-Chang Shin + 3 more

In this paper, we introduce a high efficient and practical disparity estimation using hierarchical bilateral filtering for real-time view synthesis. The proposed method is based on hierarchical stereo matching with hardware-efficient bilateral filtering. Hardware-efficient bilateral filtering is different from the exact bilateral filter. The purpose of the method is to design an edge-preserving filter that can be efficiently parallelized on hardware. The proposed hierarchical bilateral filtering based disparity estimation is essentially a coarse-to-fine use of stereo matching with bilateral filtering. It works as follows: firstly, the hierarchical image pyramid are constructed; the multi-scale algorithm then starts by applying a local stereo matching to the downsampled images at the coarsest level of the hierarchy. After the local stereo matching, the estimated disparity map is refined with the bilateral filtering. And then the refined disparity map will be adaptively upsampled to the next finer level. The upsampled disparity map used as a prior of the corresponding local stereo matching at the next level, and filtered and so on. The method we propose is essentially a combination of hierarchical stereo matching and hardware-efficient bilateral filtering. As a result, visual comparison using real-world stereoscopic video clips shows that the method gives better results than one of state-of-art methods in terms of robustness and computation time.

  • Book Chapter
  • 10.1007/978-981-15-0108-1_35
Comparative Study of RBF and Naïve Bayes Classifier for Road Detection Using High Resolution Satellite Images
  • Jan 1, 2019
  • Anand Upadhyay + 3 more

The detection of the road is one of an area of satellite image classification. The satellite image classification plays a vital role in various area of monitoring different resources available on the earth surface. Here, the high-resolution satellite data from Google earth is acquired from a different region of Mumbai, Maharashtra, India region for detection of road. This research paper used two different algorithms i.e. radial basis function neural network and Naive Bayes classifiers for the detection of reading features from the high-resolution satellite image. Both algorithms are implemented using the Matlab simulation toolbox. Radial Basis Function and Naive Bayes is a supervised classification technique applied on High-Resolution Satellite Image. Extraction of Road from the satellite image is a very difficult task because in the rural areas there are many unstructured roads which may consist of mud and concrete. After applying the algorithms on the image high-resolution satellite, the accuracy of classifiers is calculated using confusion matrix and Kappa coefficient. The accuracy of Naive Bayes found to be 91% with Kappa Value 0.698 and the accuracy of radial basis function found to be 99% with a Kappa value of 0.9831. The accuracy calculation using confusion matrix and Kappa value shows that the radial basis function neural network classifier is better than Naive Bayes classifiers for the detection of the road using high-resolution satellite image.

  • Dissertation
  • 10.3990/1.9789036534567
Dense stereo matching : in the pursuit of an ideal similarity measure
  • Jan 7, 2019
  • Sanja Damjanovic

The aim of stereo matching is to find a corresponding point for each pixel in a reference image of a stereo image pair in the other image. Corresponding points are projections onto the stereo images of the same scene point. Finding corresponding points is an essential problem in dense stereo matching. The relative displacement between the corresponding points in rectified stereo images is termed disparity. Stereo matching is ambiguous because of photometric issues, surface structure and geometric ambiguities. Finding corresponding points within uniformly colored regions or surfaces with repeating texture or structure is a huge problem. Some points do not have corresponding points due to occlusion or due to the limited field of view. We defined a probabilistic framework for stereo matching using a one-dimensional hidden Markov model. We showed that the particle filter and the particle filter followed by smoothing can be used in disparity estimation. We introduced and qualitatively compared five probabilistic algorithms for disparity estimation: the forward algorithm, the forward/backward algorithm, the Viterbi algorithm, the particle filter and the particle filter in combination with smoothing. We derived a new likelihood function for correspondence that is optimal in a probabilistic sense. We deviated from the squared window based likelihood in order to include only relevant pixels in the likelihood function. We introduced local stereo matching using sparse windows. This approach gave us a significant improvement compared to matching based on the complete windows. Further led by the idea that a different nature of texture requires a different approach to likelihood estimation, we redefined several of the most common assumptions and established a relationship between the texture and the fronto-parallel assumption and introduced local adaptive segmentation based on the local intensity variation. We redefined the Lambertian assumption for offset compensation and introduced novel preprocessing and postprocessing steps for accurate disparity map estimation. We demonstrated the performance of our algorithm on benchmark images from the Middlebury database and on own examples, and showed that the disparity maps of scenes of different natures are successfully estimated.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 164
  • 10.3390/rs10010131
Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network
  • Jan 18, 2018
  • Remote Sensing
  • Wei Guo + 3 more

Daily acquisition of large amounts of aerial and satellite images has facilitated subsequent automatic interpretations of these images. One such interpretation is object detection. Despite the great progress made in this domain, the detection of multi-scale objects, especially small objects in high resolution satellite (HRS) images, has not been adequately explored. As a result, the detection performance turns out to be poor. To address this problem, we first propose a unified multi-scale convolutional neural network (CNN) for geospatial object detection in HRS images. It consists of a multi-scale object proposal network and a multi-scale object detection network, both of which share a multi-scale base network. The base network can produce feature maps with different receptive fields to be responsible for objects with different scales. Then, we use the multi-scale object proposal network to generate high quality object proposals from the feature maps. Finally, we use these object proposals with the multi-scale object detection network to train a good object detector. Comprehensive evaluations on a publicly available remote sensing object detection dataset and comparisons with several state-of-the-art approaches demonstrate the effectiveness of the presented method. The proposed method achieves the best mean average precision (mAP) value of 89.6%, runs at 10 frames per second (FPS) on a GTX 1080Ti GPU.

  • Research Article
  • Cite Count Icon 17
  • 10.1111/coin.12339
RETRACTED: Water‐body segmentation from satellite images using Kapur's entropy‐based thresholding method
  • Jun 14, 2020
  • Computational Intelligence
  • A Aalan Babu + 1 more

Water body segmentation helps in extracting water bodies like lake, pond, river, and reservoir from high resolution satellite images. This also helps in discovering new water bodies. But, extraction of water bodies from satellite images is much complicated, mainly due to the severe disparity in size, shape, and appearance of the water bodies. In this article, Kapur's entropy‐based thresholding method is proposed for the segmentation of water bodies from Very High Resolution (VHR) satellite images. The dataset used in this article is AIRS (Aerial Imagery for Roof Segmentation) dataset, with VHR satellite images, from which only the images with water bodies are considered. Experimental results show that the proposed method yields better segmentation performance with an overall accuracy of 98.43% and Structural Similarity Index rate of 0.9712.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.