Evaluation of one-image 3D reconstruction for plant model generation.
Generating accurate and visually realistic 3D models of plants from single-view images is crucial yet remains challenging due to plants' intricate geometry and frequent occlusions. This capability matters because it supplements current plant datasets and enables non-destructive, high-throughput phenotyping for crop breeding and precision agriculture. More broadly, 3D reconstruction is particularly important because plant morphology is inherently three-dimensional, while 2D representations miss occluded leaves, branching geometry, and volumetric traits. However, plants present unique challenges compared to common rigid objects, and most current generative methods have not been systematically tested in this domain, leaving a gap in understanding their reliability for realistic plant reconstruction. This study systematically evaluates six advanced generative techniques-Hunyuan3D 2.0, Trellis (Structured 3D Latents), One2345++, InstantMesh, Direct3D and Unique3D-using the existing PlantDreamer dataset. Specifically, this research reconstructs mesh models from images of Bean plants and quantitatively assesses each method's performance against ground-truth models using Chamfer Distance, Normal Consistency, F-Score, PSNR, LPIPS, and CLIP Score. The paper also presents qualitative results of Kale and Mint plants. The results indicate that Hunyuan3D 2.0 achieves superior performance overall, suggesting its effectiveness in capturing complex plant structures. This work provides valuable insights into strengths and limitations of contemporary 3D generative approaches, guiding future improvements in realistic plant digitisation.
- Research Article
9
- 10.1016/j.eswa.2022.119209
- Nov 7, 2022
- Expert Systems with Applications
3D SOC-Net: Deep 3D reconstruction network based on self-organizing clustering mapping
- Research Article
7
- 10.1109/access.2022.3179109
- Jan 1, 2022
- IEEE Access
The application of deep learning in the field of 3D reconstruction has greatly improved the quality of 3D object reconstruction. For methods that take the point cloud as supervision information, previous research has mainly focused on the network architecture while setting Chamfer Distance (CD) loss as the default loss function. However, CD only contains distance information while ignoring directional information. In this paper, we introduce novel CD losses considering directions that can be used in a 3D reconstruction network. These CD losses consider both direction and distance information, and have two specific variants, Oriented Chamfer Distance (OCD) and Directional Chamfer Distance (DCD). Numerous experiments conducted on the deformable patch and point cloud reconstruction, show that some classic neural networks for 3D reconstruction with OCD or DCD loss can achieve better reconstruction results than those with CD loss.
- Research Article
- 10.1155/2024/5528497
- Jan 1, 2024
- Advances in Multimedia
As virtual reality technology advances, 3D environment design and modeling have garnered increasing attention. Applications in networked virtual environments span urban planning, industrial design, and manufacturing, among other fields. However, existing 3D modeling methods exhibit high reconstruction error precision, limiting their practicality in many domains, particularly environmental design. To enhance 3D reconstruction accuracy, this study proposes a digital image processing technology that combines binocular camera calibration, stereo correction, and a convolutional neural network (CNN) algorithm for optimization and improvement. By employing the refined stereo‐matching algorithm, a 3D reconstruction model was developed to augment 3D environment design and reconstruction accuracy while optimizing the 3D reconstruction effect. An experiment using the ShapeNet dataset demonstrated that the evaluation indices—Chamfer distance (CD), Earth mover’s distance (EMD), and intersection over union—of the model constructed in this study outperformed those of alternative methods. After incorporating the CNN module in the ablation experiment, CD and EMD increased by an average of 0.1 and 0.06, respectively. This validates that the proposed CNN module effectively enhances point cloud reconstruction accuracy. Upon adding the CNN module, the CD index and EMD index in the dataset increased by an average of 0.34 and 0.54, respectively. These results indicate that the proposed CNN module exhibits strong predictive capabilities for point cloud coordinates. Furthermore, the model demonstrates good generalization performance.
- Conference Article
27
- 10.1109/cvpr42600.2020.00121
- Jun 1, 2020
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision. In this work, we propose a deep learning technique for 3D object reconstruction from a single image. Contrary to recent works that either use 3D supervision or multi-view supervision, we use only single view images with no pose information during training as well. This makes our approach more practical requiring only an image collection of an object category and the corresponding silhouettes. We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner, making use of differentiable point cloud renderer to train with 2D supervision. A key novelty of the proposed technique is to impose 3D geometric reasoning into predicted 3D point clouds by rotating them with randomly sampled poses and then enforcing cycle consistency on both 3D reconstructions and poses. In addition, using single-view supervision allows us to do test-time optimization on a given test image. Experiments on the synthetic ShapeNet and real-world Pix3D datasets demonstrate that our approach, despite using less supervision, can achieve competitive performance compared to pose-supervised and multi-view supervised approaches.
- Conference Article
- 10.1109/iscas48785.2022.9937859
- May 28, 2022
Recent the-state-of-the-art image-based three-dimensional (3D) reconstruction methods that represent 3D shapes mainly using triangular mesh because of its memory efficiency and ability to present surface detail of objects compared to voxel and point cloud. Previous works usually follow an encoding and decoding pattern. A deep neural network to extract the features from the picture and reconstruct the 3D structure. It is a typical supervised learning process, requiring loss function to supervise the training. No existing works directly calculate the loss between the reconstruction mesh and ground truth mesh. Instead, they indirectly used the Chamfer Distance (CD) between point clouds as the loss. Most of the previous works focus on the encoding and decoding parts instead of the loss and CD is used for all works. However, when CD is applied to two point clouds with the same number of points, some points can match any number of points in another point cloud, so some points will be less involved in calculating the loss function, which will reduce the utilization of information. Therefore, We propose a new point matching strategy to calculate the loss. The point matching strategy we proposed limits the maximum number of matches for each point, allowing more points to be more involved in the loss calculation, thereby improving the information utilization rate. Experiments on single view reconstruction (SVR) and auto-encoding methods show that this new loss method can replace CD in this type of works and has better training results and 3D reconstruction quality.
- Book Chapter
- 10.1007/978-3-030-87361-5_12
- Jan 1, 2021
Recovering the 3D shape of an object from single-view image with deep neural network has been attracting increasing attention in the past few years. Recent approaches based on convolutional neural networks have shown excellent results on single-view image. Most of them, however, have many model’s parameters or fewer parameters with performance degradation. Therefore, in this work we propose a feature selection module to balance this problem. This module first calculates the uncertain degree map to obtain the feature coordinates which means some coarse parts needs to be corrected. Then using these coordinates, features in several feature maps are selected. Finally, use MLP Layer to obtain fine features by taking features selected as input. Training and Inference are slightly different in this module. Using this module, we achieve better performance with about 18% parameters addition and comparable performance with about 30% model’s parameters decrease based on the Pix2Vox [1] framework.
- Research Article
6
- 10.1007/s00371-022-02669-x
- Sep 15, 2022
- The Visual computer
This work deals with the automatic 3D reconstruction of objects from frontal RGB images. This aims at a better understanding of the reconstruction of 3D objects from RGB images and their use in immersive virtual environments. We propose a complete workflow that can be easily adapted to almost any other family of rigid objects. To explain and validate our method, we focus on guitars. First, we detect and segment the guitars present in the image using semantic segmentation methods based on convolutional neural networks. In a second step, we perform the final 3D reconstruction of the guitar by warping the rendered depth maps of a fitted 3D template in 2D image space to match the input silhouette. We validated our method by obtaining guitar reconstructions from real input images and renders of all guitar models available in the ShapeNet database. Numerical results for different object families were obtained by computing standard mesh evaluation metrics such as Intersection over Union, Chamfer Distance, and the F-score. The results of this study show that our method can automatically generate high-quality 3D object reconstructions from frontal images using various segmentation and 3D reconstruction techniques.
- Research Article
33
- 10.1109/access.2020.2992554
- Jan 1, 2020
- IEEE Access
Object 3D reconstruction from a single-view image is an ill-posed problem. Inferring the self-occluded part of an object makes 3D reconstruction a challenging and ambiguous task. In this paper, we propose a novel neural network for generating a 3D-object point cloud model from a single-view image. The proposed network named 3D-ReConstnet, an end to end reconstruction network. The 3D-ReConstnet uses the residual network to extract the features of a 2D input image and gets a feature vector. To deal with the uncertainty of the self-occluded part of an object, the 3D-ReConstnet uses the Gaussian probability distribution learned from the feature vector to predict the point cloud. The 3D-ReConstnet can generate the determined 3D output for a 2D image with sufficient information, and 3D-ReConstnet can also generate semantically different 3D reconstructions for the self-occluded or ambiguous part of an object. We evaluated the proposed 3D-ReConstnet on ShapeNet and Pix3D dataset, and obtained satisfactory improved results.
- Conference Article
- 10.13031/aim.202100282
- Jan 1, 2021
<b><sc>Abstract.</sc></b> Plant morphological features are important factors that define plant production. Leaf area, leaf number, and plant height are important morphological features that relay critical information about how plants respond to environmental conditions such as lighting. Tracking plant morphology permits growers to better quantify the impact of different environmental factors on plant development to optimize growing conditions and increase yield. However, plant morphology tracking can be redundant and time-consuming, as non-destructive assessment must be manually performed. More precise morphological determinations may be conducted post-harvest, but this does not include real-time data on how plants respond to their growing environment. The objective of this study was to develop an apparatus that determines plant architecture and morphological features with a 3D photogrammetry technique using croton (Codiaeum Variegatum Blume cv. Petra), dumb cane (Dieffenbachia araceae cv. Camille), and kale (Brassica oleceara cv. Winterbor). Actual plant morphological measurements, including leaf area(s), leaf number(s), leaf angle, and plant height were compiled and compared to data obtained with a 3D scanning plant model created from the same plant, using 3D photogrammetry technology. Data collected with the developed apparatus indicate that this method shows potential in allowing plant scientists and growers to better assess how environmental factors influence morphological features during plant development, with the possibility of improving crop production.
- Preprint Article
- 10.5194/egusphere-egu2020-20566
- Mar 23, 2020
&lt;p&gt;Volcanic ash suspended in the atmosphere can pose a significant hazard to aviation, with the potential to cause severe damage or shutdown of jet engines. Forecasts of ash contaminated airspace are generated using atmospheric transportation and dispersion models, among the inputs to these models are eruption source parameters such as cloud-top height and cloud volume. A potential method to measure these source parameters is space carving &amp;#8211; a technique to generate 3D hull reconstructions of clouds using multi-angle imagery.&lt;/p&gt;&lt;p&gt;This paper investigates the potential for 3D space carving reconstruction using multi-angle satellite imagery.&amp;#160; This builds on previous work where the authors have applied this technique to ground-based and drone-based imagery. A satellite-based imaging platform has advantages such as global coverage and being safely removed from any damaging effects of a volcanic eruption. However, the accuracy of any potential reconstruction will be affected by the distances and restricted viewing angles of a satellite in orbit.&lt;/p&gt;&lt;p&gt;To assess the general suitability of a satellite-based system for reconstruction, as well as different configurations of the system, a method for simulating satellite imagery and applying a space carving reconstruction scheme was developed. This method allows the analysis of the effects of orbital dynamics (altitude, inclination, etc.), spatial resolutions, and imaging rates on the efficacy of the 3D reconstruction of ash clouds. The model utilises an input &amp;#8216;ground-truth&amp;#8217; voxel-based plume model as the imaging target and generates simulated satellite images based on the user defined orbital and camera properties. These simulated images are then used for reconstruction and the resultant plume can be compared against the ground-truth model.&lt;/p&gt;&lt;p&gt;A range of possible observation schemes (controlling number and distribution of images and limits on viewing angles) have been modelled over a range of possible orbital paths and the accuracy of the space carving reconstruction has been measured. Spatial resolution limits for the accurate reconstruction of various plume sizes can be calculated. Limitations of the model are presented, including the sensitivity to the size and shape of the input plume model and the impact of the perfect feature identification in the simulated images. Further work includes the use of additional input models and improvements and validation of the image simulation method.&lt;/p&gt;&lt;p&gt;The methods presented in this study demonstrate the potential of satellite-based 3D reconstruction methods in the forecasting of ash dispersion, leading to potential improvements in airspace management and aviation safety.&lt;/p&gt;
- Book Chapter
132
- 10.1007/978-3-030-01237-3_49
- Jan 1, 2018
In this paper, we present a framework for reconstructing a point-based 3D model of an object from a single-view image. We found distance metrics, like Chamfer distance, were used in previous work to measure the difference of two point sets and serve as the loss function in point-based reconstruction. However, such point-point loss does not constrain the 3D model from a global perspective. We propose adding geometric adversarial loss (GAL). It is composed of two terms where the geometric loss ensures consistent shape of reconstructed 3D models close to ground-truth from different viewpoints, and the conditional adversarial loss generates a semantically-meaningful point cloud. GAL benefits predicting the obscured part of objects and maintaining geometric structure of the predicted 3D model. Both the qualitative results and quantitative analysis manifest the generality and suitability of our method.
- Research Article
- 10.55041/ijsrem44955
- Apr 22, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract— Accurate 3D reconstruction from 2D images plays a critical role in various applications including medical imaging, robotics, autonomous navigation, and augmented reality. Traditional reconstruction techniques often require multiple viewpoints or sensor setups, limiting their feasibility in resource-constrained environments. In this work, we propose a deep learning-based monocular 3D reconstruction pipeline that generates high-quality 3D models from a single RGB image. The core of this framework lies in a custom U-Net++ architecture, designed and trained on the NYU Depth V2 dataset for robust depth estimation. This model is evaluated against state-of-the-art alternatives including MiDaS (DPT-Hybrid), Depth Anything V2, and GLPN to assess its performance across accuracy, efficiency, generalization, and visualization quality. The proposed pipeline performs image preprocessing, depth map prediction, and 3D point cloud generation using Open3D, followed by mesh reconstruction techniques like Poisson Surface Reconstruction. The evaluation metrics include MSE, SSIM, PSNR, and R² Score for depth maps, alongside qualitative analysis of 3D reconstruction quality. Comparative results demonstrate that while GLPN yields the most consistent performance, the Custom U-Net++ model achieves competitive accuracy with significantly improved efficiency and adaptability, making it suitable for real-time or domain-specific deployments. This research highlights the potential of lightweight, custom-designed architectures for scalable and robust single-view 3D reconstruction. Future directions include multi-view integration, dataset expansion, and enhancing interpretability through uncertainty estimation techniques. Keywords— Monocular Depth Estimation, 3D Reconstruction, U-Net++, MiDaS, GLPN, Deep Learning, Point Clouds, Open3D.
- Research Article
4
- 10.1016/j.gmod.2019.101050
- Nov 1, 2019
- Graphical Models
Semantic based autoencoder-attention 3D reconstruction network
- Book Chapter
- 10.1007/978-3-031-20868-3_29
- Jan 1, 2022
For the 3D reconstruction of objects in a real scene, the state-of-the-art scheme is to detect and identify the target by a classic deep neural network and reconstruct the 3D object with deep implicit function (DIF) based methods. This scheme can be computationally and memory efficient, representing high-resolution geometry of arbitrary topology for reconstructing the 3D objects in a scene. However, geometry constraints are lacking in these procedures, which may lead to fatal mistaken identification or structural errors in the reconstruction results. In this paper, we propose to enhance the geometry constraint of the DIF-based 3D reconstruction. A geometry retainer module (GRM) ensures the detected target always retains the correct 2D geometry. The chamfer distance (CD) is introduced as a constraint on the 3D geometry for the DIF-based method. Correspondingly, a strategy to extract a point cloud from the signed distance field (SDF) is proposed to complete this 3D geometry constraint. Abundant experiments show that our method improves the quality of 3D reconstruction greatly.Keywords3D reconstructionDeep learningDeep implicit function
- Research Article
10
- 10.1109/tvcg.2021.3131712
- Mar 1, 2023
- IEEE Transactions on Visualization and Computer Graphics
3D reconstruction from single-view images is a long-standing research problem. There have been various methods based on point clouds and volumetric representations. In spite of success in 3D models generation, it is quite challenging for these approaches to deal with models with complex topology and fine geometric details. Thanks to the recent advance of deep shape representations, learning the structure and detail representation using deep neural networks is a promising direction. In this article, we propose a novel approach named STD-Net to reconstruct 3D models utilizing mesh representation that is well suited for characterizing complex structures and geometry details. Our method consists of (1) an auto-encoder network for recovering the structure of an object with bounding box representation from a single-view image; (2) a topology-adaptive GCN for updating vertex position for meshes of complex topology; and (3) a unified mesh deformation block that deforms the structural boxes into structure-aware meshes. Evaluation on ShapeNet and PartNet shows that STD-Net has better performance than state-of-the-art methods in reconstructing complex structures and fine geometric details.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.