3D scene reconstruction from a limited number of viewpoints
3D scene reconstruction from a limited number of viewpoints
- Conference Article
2
- 10.1109/wcica.2010.5554187
- Jul 1, 2010
This paper provide a method for reconstruction of non-structured 3D scene both indoor and outdoor. Based on the fusion of panoramic laser and monocular vision, we use Levenberg-Marquardt method to carry out the intrinsic calibration of a camera and the extrinsic calibration between a camera and a 3D laser range finder. We use the calibration result to color the laser data, and then expand to the whole panoramic scene and accomplish the reconstruction of the 3D scene. To expand the range of reconstruction of 3D scene, this paper also provides an improved ICP algorithm to register two scenes. To reduce the computational complexity, we use position and orientation information to preprocessing the laser data to acquire overlapping region, and with the help of KD TREE, the searching of matching pairs is speed up. Experiment results with panoramic laser and monocular vision platform show the validity and practicability of the method.
- Research Article
2
- 10.5194/isprs-archives-xliii-b2-2022-343-2022
- May 30, 2022
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Digital three-dimensional (3D) reconstruction of objects has many applications in computer vision, archaeology, and the entertainment industry. Digital 3D reconstruction can be used to preserve the appearance of valuable historical artifacts; it can be used to track the pose of an object in the images, and it can facilitate object modelling. 3D reconstruction of objects in the past has been achieved using many sensors such as cameras and laser-strip scanners. Monocular camera-based object 3D modelling can be categorized into sparse feature detector/descriptor-based and dense silhouette-based approaches. Feature-based methods identify distinctive features on the objects (captured from many images). In contrast, silhouette-based methods only require a distinguishable boundary between the object and the background. Silhouette-based methods have the advantage that in the controlled setups, a special background can be designed to be distinguishable from the object of interest; therefore, uniquely identifiable textures on the object’s surface are not required. Despite their advantages, silhouette-based probabilistic reconstruction remains a challenge. This article proposes a new probabilistic approach using 3D occupancy grids for the silhouette-based digital reconstruction of an object. The proposed method is designed to be usable with monocular cameras and achieves an accurate reconstruction using only sixteen images. Compared to similar silhouette-based volumetric approaches, the voxels are not discarded immediately during the reconstruction, and the occupancy grid mapping continuously changes the occupancy probability of the voxels with each new image included.
- Book Chapter
1
- 10.1007/978-3-319-13386-7_4
- Jan 1, 2014
We present a computational framework, which combines depth and colour (texture) modalities for 3D scene reconstruction. The scene depth is captured by a low-power photon mixture device (PMD) employing the time-of-flight principle while the colour (2D) data is captured by a high-resolution RGB sensor. Such 3D capture setting is instrumental in 3D face recognition tasks and more specifically in depth-guided image segmentation, 3D face reconstruction, pose modification and normalization, which are important pre-processing steps prior to feature extraction and recognition. The two captured modalities come with different spatial resolution and need to be aligned and fused so to form what is known as view-plus-depth or RGB-Z 3D scene representation. We discuss specifically the low-power operation mode of the system, where the depth data appears very noisy and needs to be effectively denoised before fusing with colour data. We propose using a modification of the non-local means (NLM) denoising approach, which in our framework operates on complex-valued data thus providing certain robustness against low-light capture conditions and adaptivity to the scene content. Further in our approach, we implement a bilateral filter on the range point-cloud data, ensuring very good starting point for the data fusion step. The latter is based on the iterative Richardson method, which is applied for efficient non-uniform to uniform resampling of the depth data using structural information from the colour data. We demonstrate a real-time implementation of the framework based on GPU, which yields a high-quality 3D scene reconstruction suitable for face normalization and recognition.
- Research Article
9
- 10.1016/j.optlaseng.2024.108737
- Mar 1, 2025
- Optics and Lasers in Engineering
High-turbidity underwater Fourier single-pixel 3D imaging based on pseudo-camera calibration with pixel-mapping at low sampling rates
- Research Article
159
- 10.1061/(asce)cp.1943-5487.0000446
- Dec 2, 2014
- Journal of Computing in Civil Engineering
Traditional crack assessment methods for concrete structures are time consuming and produce subjective results. The development of a means for automated assessment employing digital image processing offers high potential for practical implementation. However, two problems in two-dimensional (2D) image processing hinder direct application for crack assessment, as follows: (1) the image used for the digital image processing has to be taken perpendicular to the surface of the concrete structure, and (2) the working distance used in retrieving the imaging model has to be measured each time. To address these problems, this paper proposes a combination of 2D image processing and three-dimensional (3D) scene reconstruction to locate the 3D position of crack edges. In the proposed algorithm, first the precise crack information is obtained from the 2D images after noise elimination and crack detection using image processing techniques. Then, 3D reconstruction is conducted employing several crack images to ...
- Research Article
- 10.1142/s0218126624502827
- Jun 26, 2024
- Journal of Circuits, Systems and Computers
Three-dimensional (3D) scene reconstruction for moving objects remains a challenging research topic. It is crucial to effectively capture feature representations from dynamic and complex scenarios. Consequently, this work introduces the integration of multi-scale attention and dilated convolution to create an enhanced deep-learning structure for this purpose. Therefore, this paper proposes a 3D reconstruction method for moving objects based on multi-scale attention and a dilated convolutional neural network (CNN). Specifically, a multi-scale attention algorithm framework that incorporates dilated CNNs is designed to extract multi-scale features of moving targets. The dilated CNN is incorporated to enhance the model’s perception ability and receptive field while maintaining a lightweight structure. This integrated design aims to achieve automatic learning targeted at features and scene information at different scales. By increasing the effective range of information perception and further enhancing the quality of reconstruction results, a coordinate system is established for 3D scene reconstruction of moving targets. Finally, a comparative analysis of subjective vision, visualization, and reconstruction algorithms is conducted using real-world cases. The experimental results demonstrate that the proposed method exhibits significant advantages in the 3D scene reconstruction task of moving targets compared to traditional methods.
- Conference Article
5
- 10.1109/bmei.2013.6746908
- Dec 1, 2013
Three dimension (3D) reconstruction of medical images is widely applied to clinical diagnosis and treatment. The Marching Cubes (MC) algorithm is a well-known surface rendering method. However, the standard MC visits all cubes including active and non-active cubes by sequential traversal in the process of the isosurface extraction from scalar volumetric data sets, which is time consuming and inefficient. In this study, combining the seeded region growing and the standard MC algorithm, an improved MC algorithm is proposed to reconstruct encephalic tissue and nasopharynx using Visualization Toolkit (VTK). The main idea of the new algorithm is to avoid the computation of non-active cubes. Theoretical analysis and experimental results show that the improved MC algorithm accelerates the 3D reconstruction of medical images and removes the noise from the segmentation stage. Moreover, normal calculation and mesh smoothing are investigated to improve the rendering effect.
- Research Article
11
- 10.3329/diujst.v4i1.4348
- Jan 1, 1970
- Daffodil International University Journal of Science and Technology
This article presents a new method to determine disparity map useful for three-dimensional (3D) scene reconstruction. The main task behind the computation of disparity map is stereo correspondence matching. In recent years, several stereo matching algorithms have been developed to find corresponding pairs in two images: left and right images captured by a stereo camera. But these algorithms exhibit a very high computational cost. With a view to reduce the computation time and produce a smooth and detailed disparity map, a fast and new approach based on average disparity estimation is proposed in this research, which can tackle additive noise. Experimental results confirm that the method achieves a substantial gain in accuracy with less expense of computation time. Key Words: Disparity map, Stereo correspondence, Stereo Vision, 3D Scene Reconstruction. DOI: 10.3329/diujst.v4i1.4348 Daffodil International University Journal of Science and Technology Vol.4(1) 2009 pp.9-13
- Research Article
5
- 10.6100/ir735445
- Jan 1, 2012
- Data Archiving and Networked Services (DANS)
In the last two decades, minimally-invasive interventions have replaced traditional surgery in many clinical scenarios. In these interventions, the doctor manipulates small devices inside the patient through a small incision, while guided by live imaging. In many cases, this guidance is provided by low-dose X-ray imaging. At this moment, live image guidance conveys only two-dimensional (2D) information, whereas information on the 3D location and orientation of structures of interest would resolve 3D positioning ambiguities and significantly aid the accuracy and safety of these procedures. The work described in this thesis aims at providing accurate and reliable 3D information for current interventional systems, by employing multiple X-ray views, acquired with a limited motion of the X-ray imaging apparatus. The application of 2D image analysis techniques in combination with 3D modeling enables 3D reconstructions of objects in the image. The work in this thesis is organized into three layers of increasing complexity of the 3D reconstructed objects. Prior to addressing the reconstruction problem, the thesis begins in Chapter 2 with a consideration of important system aspects pertaining to X-ray imaging, focusing on image quality, which plays a crucial role in the success of image analysis algorithms. The reported work contributes an image quality assessment method, based on an information-theoretic approach, which encapsulates the major image quality aspects (namely contrast, sharpness and noise) and formulates them in the domain of information. Chapters 3 and 4 present the first layer of reconstruction, targeting single feature points. At this layer, our work has provided a thorough analysis of multi-view relations, as formulated for C-arm based X-ray. A distinction is made between the 2D image transformation, feature point detection and tracking step (Chapter 3), and the subsequent 3D camera modeling and reconstruction steps (Chapter 4). In this part of the thesis, we have contributed: (1) a method for evaluation of feature point detection techniques for non-planar scenes, (2) a tracking algorithm based on geometric constraints, which allows fast tracking of feature points, and (3) the first –to the best of our knowledge– analysis of the 3D point reconstruction accuracy and related requirements of multi-view X-ray. Simulation results show that 3D point reconstruction using 5-10 views spanning a rotation angle of 8.5± i17± is accurate to within 1 mm, while results on phantom sequences have shown that the tracked feature points can be reconstructed with an accuracy of about 1-4 mm. Chapters 5 and 6 discuss the second reconstruction layer of rigid objects. We have chosen to reconstruct curvilinear objects, as these may be used to model many surgical instruments such as e.g. catheters, needles, etc. A 2D modeling step, described in Chapter 5, precedes the 3D reconstruction. The 2D modeling aims at detecting and tracking curves in the multi-view sequence, which are subsequently used in Chapter 6, to obtain 3D curves representing the objects of interest. The main contributions here are: (1) a novel algorithm, called SPD-RANSAC, for the detection of multiple (curvilinear) models in noisy images, (2) a curve tracking algorithm, based on geometric constraints and a cost function, and (3) a curve reconstruction technique, which can be potentially refined by adding a non-linear optimization step. Here we have demonstrated reconstructions with an accuracy of 1-2 mm for phantom datasets, and ¼ 5 mm for clinical datasets. This enables the simultaneous 2D detection, tracking and 3D reconstruction of several curvilinear instruments using only a few X-ray views. Chapter 7 treats the third layer of non-rigid object reconstruction, dealing with the challenging problem of 3D reconstruction when motion occurs during the image acquisition. In our application scenario, such motion stems from patient breathing, heartbeat, instrument manipulation by the doctor, etc. In computer vision, observing a moving object with a moving camera is an inherently underconstrained problem, termed Non-Rigid Structure-from-Motion. We analyze this complex problem for the case of steerable catheters used in cardiac ablation and contribute a solution for deformable, time-varying catheter reconstruction. A model from the field of Robotics is employed to parameterize deforming 3D+T shape. Simulations have shown that a non-linear optimization scheme succeeds in correctly recovering 3D+T catheter shape with an accuracy of a few millimeters, while phantom experiments recover catheter shape with a repeatability of 5 mm. The results demonstrated for each of the reconstruction layers have shown that multi-view X-ray can provide 3D reconstructions of relevant objects, with a sufficiently high accuracy for a number of interventions; the setup employed requires no additional equipment apart from the existing interventional X-ray system. We therefore conclude that multi-view X-ray, along with the techniques proposed in this thesis, can be employed in the near future for unambiguous 3D guidance in a real clinical scenario.
- Research Article
1
- 10.1088/1742-6596/2637/1/012053
- Nov 1, 2023
- Journal of Physics: Conference Series
Limited visual information contained in single images and complex motion models of objects may lead to severe fragmentation and confusion of model backbone in the 3D reconstruction of objects. In order to fully extract feature information in a single image and reduce noise interference caused by environmental factors in 3D reconstruction, a Parallel Dual-Branch Pyramid Stacking Network (PDB-PSN) model is proposed. A parallel dual-branch network model is used to construct an encoder-decoder framework based on cascaded feature extraction, encode/decode high and low-resolution features in a single image, and then convert from 2D to 3D through implicit functions to achieve the 3D reconstruction of target objects in a single image. A cascaded feature extraction network is used as a low-resolution feature extraction network to extract global features of objects. In the high-resolution feature extraction branch, three concatenated hourglass networks and dilated convolutions are used in the hourglass network to increase the receptive field and obtain more global information in order to maintain the integrity of the reconstructed object limbs. A threshold processing module is set to remove irrelevant information and ensure the integrity of global information, and meanwhile, to reduce the interference of irrelevant noise information. Simulation experiments on a self-built terracotta dataset show that the PDB-PSN model can completely reconstruct the 3D model in a single image and effectively eliminate model fragmentation in the reconstruction results.
- Research Article
- 10.3390/s26031036
- Feb 5, 2026
- Sensors (Basel, Switzerland)
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built LED Clock that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34 ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications.
- Research Article
5
- 10.3390/jpm14090982
- Sep 16, 2024
- Journal of personalized medicine
Understanding complex neurosurgical procedures and diseases, such as skull-base meningiomas, is challenging for patients due to the intricate anatomy and the involvement of critical neurovascular structures. Enhanced patient comprehension is crucial for satisfaction and improved clinical outcomes. Patient-specific 3D models have demonstrated benefits in patient education, though they are costly and time-intensive to produce. This study investigates whether the use of 3D volumetric reconstructions with anatomical segmentation, widely available via neuronavigation software, can improve patients' understanding of skull-base meningiomas, surgical procedures, and potential complications. This study included twenty patients with skull-base meningiomas. Three-dimensional volume reconstructions and anatomical segmentations were created using preoperative MRI sequences with neuronavigation software. These reconstructions were used during patient consultations where a surgeon explained key aspects of the disease, the surgical intervention, and potential complications. A questionnaire assessed the patients' perceptions of the utility of these 3D reconstructions. The majority of patients (75%) found the 3D volumetric reconstructions and anatomical segmentations to be more beneficial than MRI images for understanding their disease. Similarly, 75% reported improved comprehension of the surgical approach, and 85% felt that the reconstructions enhanced their understanding of potential surgical complications. Overall, 65% of patients considered the 3D reconstructions valuable in medical consultations. Our study indicates that using accessible, cost-effective, and non-time-consuming 3D volumetric reconstructions with anatomical segmentation enhances patient understanding of skull-base meningiomas. Further research is necessary to confirm these findings, compare these reconstructions with physical 3D models and virtual reality models, and evaluate their impact on patient anxiety regarding the surgical procedure.
- Conference Article
8
- 10.1145/3583740.3630267
- Dec 6, 2023
Multi-view 3D reconstruction driven augmented, virtual, and mixed reality applications are becoming increasingly edge-native, due to factors such as, rapid reconstruction needs, security/privacy concerns, and lack of connectivity to cloud platforms. Managing edge-native 3D reconstruction, due to edge resource constraints and inherent dynamism of 'in the wild' 3D environments, involves striking a balance between conflicting objectives of achieving rapid reconstruction and satisfying minimum quality requirements. In this paper, we take a deeper dive into multi-view 3D reconstruction latency-quality trade-off, with an emphasis on reconstruction of dynamic 3D scenes. We propose data-level and task-level parallelization of 3D reconstruction pipelines, holistic edge system optimizations to reduce reconstruction latency, and long-term minimum reconstruction quality satisfaction. The proposed solutions are validated through collection of real-world 3D scenes with varying degree of dynamism that are used to perform experiments on hardware edge testbed. The results show that our solutions can achieve between 50% to 75% latency reduction without violating long term minimum quality requirements.
- Conference Article
1
- 10.1109/cira.2003.1222131
- Jul 16, 2003
This paper addresses recent developments of circular line-scan imaging system for applications of 3D scene visualization and/or reconstruction. Such an imaging system is characterized by rotating linear sensors capturing one image column at a time respectively. This allows for accurate mappings onto a cylindrical image surface and very high image resolutions paid by motion distortions in dynamic scenes. These images can be used, for example, for stereo visualization and 3D reconstruction in the VR applications where extremely high image resolution is of benefit (for static scenes). The paper elaborates the basic geometry, the geometric analysis, and the design and control of imaging parameters to ensure high-quality 3D data acquisition.
- Conference Article
9
- 10.1117/12.2556122
- Apr 6, 2020
Image-based scene 3D reconstruction is one of the key tasks for many machine vision applications such as scene understanding, object pose estimation, autonomous navigation. A set of reliable and accurate methods for multi-view scene 3D reconstruction has been developed last decades. But a significant drawback of such 3D reconstruction technique is the need for acquiring a large number of images in the processed sequence to obtain an acceptable 3D scene representation. Recently modern convolutional neural network (CNN) models achieve the best quality for object recognition, image segmentation, image translation and some other challenging computer vision problems. The paper proposes a convolutional neural network architecture and a technique for training data preparation which provide a prediction of voxel model of a 3D scene with several objects. For CNN training and evaluation a special dataset was collected and annotated. It contains image sequences of several scenes and corresponding depth images and 3D models of these scenes. The image sequence serves as the primary data used for further scene 3D reconstruction by SfM technique. Structure from Motion processing results in surface 3D models of all objects in the scene and camera positions and orientation for every image in a sequence. Then surface 3D model is transformed into voxel 3D model and segmented into separate objects. Conditional generative adversarial network architecture was developed for 3D reconstruction by single image. Its generative part translates an input color image into an output voxel model. The discriminative part distinguishes the correct output (close to real voxel model) from false output (wrong output voxel model). Both parts are trained simultaneously on the prepared dataset. Evaluation on the testing part of the prepared dataset has demonstrated the ability of prediction 3D models of previously unobserved complex scenes containing several objects. The proposed neural network architecture provides high generalization ability and improved resolution of predicted voxel 3D models.