Articles published on Multi-View Stereo
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1023 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.cag.2026.104561
- May 1, 2026
- Computers & Graphics
- Yixiao Chen + 5 more
PS-GS: Gaussian splatting for multi-view photometric stereo
- New
- Research Article
- 10.1016/j.patcog.2025.112907
- May 1, 2026
- Pattern Recognition
- Yongjian Liao + 8 more
Supervisory feedback for high-resolution low-textured large-scale multi-view stereo
- New
- Research Article
- 10.1142/s230138502850015x
- Apr 21, 2026
- Unmanned Systems
- Zebiao Wu + 1 more
In multirotor-based photogrammetry, high-quality multi-view stereo modelling of a site requires a flight path that captures images from a wide range of viewing angles and altitudes. However, aerodynamic models show that vertical manoeuvring consumes notably more energy than horizontal cruising, and flight paths with many vertical transitions therefore lead to rapid battery depletion and lower operational time. Existing planning strategies typically address trajectory generation within continuous spaces, volumetric grids, or fixed uniform layers, but these approaches often fail to limit the vertical search space effectively and struggle to reduce energy usage without sacrificing the specific camera placements needed to reconstruct complex structural details. In this work, we propose a multi-stage planning framework that addresses the trade-off between energy efficiency and reconstruction quality by restricting the sampling space of feasible cameras to a sparse subset of vertical layers. To satisfy photogrammetric requirements within this reduced space, we introduce a view selection scheme based on global co-visibility analysis. This scheme prioritises surface regions that lack viewpoint observation redundancy and allocates camera viewpoints to capture intricate structural details that might otherwise be inadequately observed. Finally, a path planning stage generates collision-free trajectories within this sparse configuration, and this design implicitly minimises energy-intensive vertical transitions. Validation on real-world scanned scenes demonstrates that this approach reduces total flight energy by 6% to 25% compared to the next most energy-efficient baseline and maintains reconstruction completeness comparable to coverage-driven methods.
- Research Article
- 10.1145/3799231
- Apr 20, 2026
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Mingwei Cao + 6 more
Many multi-view stereo (MVS) networks with a cascaded structure can effectively estimate depth while saving memory. However, the accuracy of the depth map in the fine stage depends on the depth map estimated in the coarse stage. Additionally, the multi-stage depth maps generated by the cascaded structure are used to compute losses but are not reused, resulting in a loss of inter-stage differentiation information. To address these issues, we propose a dual-uncertainty estimation MVS method that learns an MVS network based on adjacent stage and pair-wise stage uncertainty estimation, named APMVS. The core of the proposed APMVS is to employ dual-uncertainty estimation to mitigate the adverse effects of the cascaded structure. Specifically, it involves two estimation modules: adjacent stage uncertainty (ASU) and pair-wise stage uncertainty (PSU). The ASU estimation module dynamically adjusts the depth-hypothesis range by leveraging uncertainty from the previous stage, thereby improving the accuracy of depth-map prediction in the current stage. The PSU estimation module estimates the uncertainty between each pair of stages. Thus, regions with high uncertainty have minimal impact. We evaluate the proposed APMVS on the DTU, Tanks and Temples, and BlendedMVS datasets. Experimental results show that our method achieves superior reconstruction quality compared with other state-of-the-art methods.
- Research Article
1
- 10.1016/j.optlaseng.2025.109578
- Apr 1, 2026
- Optics and Lasers in Engineering
- Chao Liu + 8 more
Hyperspectral images 3D reconstruction based on structure-from-motion and multi-view stereo
- Research Article
- 10.59075/jssa.v4i1.561
- Mar 13, 2026
- Journal for Social Science Archives
- Faisal Shah + 5 more
Augmented Reality (AR) applications cannot be used without effective and robust 3D reconstructions of the scenes to appropriately position virtual objects into the real world. However, the reconstruction of high-fidelity 3D using mobile and resource-constrained hardware remains a significant challenge due to memory, processing, sensor and battery life constraints. Conventional methodologies that are geometric, such as Structure from Motion (SfM), Multi-View Stereo (MVS), and feature-based Simultaneous Localization and Mapping (SLAM) have also proven useful in tracking a camera and sparse-to-dense mapping. However, these techniques tend to perform poorly in low-texture scenes, moving scenes, and complicated lighting situations. This paper will solve these shortcomings by presenting a lightweight hybrid 3D reconstruction system that combines traditional SLAM approaches with a small neural augmentation system. Within this framework, SLAM will be used to precisely estimate poses and map geometries and the neural component will be used to refine critical regions to improve the quality of the texture and fill smaller gaps in reconstructions without causing a lot of computational load. The system also takes advantage of the performance of the embedded systems and client-grade GPUs, as well as, uses the GPUs to attain nearly real-time performance through the use of GPU-based optimization, lightweight data structure, and adaptive processing scheme. The conducted experiments suggest that the suggested hybrid scheme notably enhances the accuracy of reconstructions and visual quality without compromising on the performance specifications that are vital in resource-constrained settings.
- Research Article
- 10.3390/jimaging12030128
- Mar 13, 2026
- Journal of imaging
- Ali Javadi Moghadam + 4 more
Three-dimensional (3D) reconstruction using images is one of the most significant topics in computer vision and photogrammetry, with wide-ranging applications in robotics, augmented reality, and mapping. This study investigates methods of 3D reconstruction using video (especially monocular video) data and focuses on techniques such as Structure from Motion (SfM), Multi-View Stereo (MVS), Visual Simultaneous Localization and Mapping (V-SLAM), and videogrammetry. Based on a statistical analysis of SCOPUS records, these methods collectively account for approximately 6863 journal publications up to the end of 2024. Among these, about 80 studies are analyzed in greater detail to identify trends and advancements in the field. The study also shows that the use of video data for real-time 3D reconstruction is commonly addressed through two main approaches: photogrammetry-based methods, which rely on precise geometric principles and offer high accuracy at the cost of greater computational demand; and V-SLAM methods, which emphasize real-time processing and provide higher speed. Furthermore, the application of IMU data and other indicators, such as color quality and keypoint detection, for selecting suitable frames for 3D reconstruction is investigated. Overall, this study compiles and categorizes video-based reconstruction methods, emphasizing the critical step of keyframe extraction. By summarizing and illustrating the general approaches, the study aims to clarify and facilitate the entry path for researchers interested in this area. Finally, the paper offers targeted recommendations for improving keyframe extraction methods to enhance the accuracy and efficiency of real-time video-based 3D reconstruction, while also outlining future research directions in addressing challenges like dynamic scenes, reducing computational costs, and integrating advanced learning-based techniques.
- Research Article
- 10.4287/jsprs.65.22
- Mar 10, 2026
- Journal of the Japan society of photogrammetry and remote sensing
- Kazuki Yoshida
Application of Structure-from-Motion (SfM) and Multi-View Stereo (MVS) to archival aerial photographs enables quantitative re-examination of terrain and surface changes caused by past disasters. This study reassessed three types of historical disasters-volcanic eruptions, landslides, and windthrow events-through differential analysis of pre- and post-event digital surface models (DSMs) reconstructed from the imagery. The SfM/MVS-derived DSMs captured volcanic terrain changes such as scoria-cone formation and lava-flow emplacement, and quantitatively delineated areas affected by landslides and windthrow.
- Research Article
- 10.1016/j.dib.2026.112642
- Mar 6, 2026
- Data in Brief
- Prasad Nethala + 5 more
Three-dimensional (3D) point-cloud phenotyping enables non-destructive and repeatable characterization of plant architecture, supporting the measurement of traits such as internode length, branching topology, and organ orientation. This article presents TomatoPGT (Tomato Plant Graph Twin), a 3D tomato dataset designed for research on semantic/instance segmentation, graph-based structural representation, and graph-derived phenotypic trait extraction.The dataset contains 42 scans from three greenhouse-grown tomato plants acquired across early to mid-vegetative development using a rotational multi-view imaging system. Each scan consists of 60–70 overlapping RGB images captured under uniform illumination and reconstructed into a metrically scaled dense colored point cloud using Structure-from-Motion and multi-view stereo. TomatoPGT provides: (i) multi-view RGB images, (ii) dense colored point clouds, (iii) manually curated semantic and instance annotations at organ level, (iv) graph representations encoding plant topology and geometry, and (v) tabulated phenotypic traits computed deterministically from the graphs (internode length, insertion angles, and phyllotactic angles). TomatoPGT supports reproducible development and evaluation of 3D phenotyping pipelines, including learning-based segmentation and graph-based modeling of plant architecture.
- Research Article
- 10.1364/ao.589623
- Feb 27, 2026
- Applied optics
- Xingsheng Liu + 1 more
Three-dimensional (3D) vision reconstruction has been an essential approach to scene understanding, which can hardly strike a good balance among compact structures, high resolution, and flexible viewpoints. In this paper, we demonstrate a prism-empowered virtual multiview stereo vision architecture that allows 3D reconstruction with a single stationary camera. A variable-boresight perspective projection model is presented to characterize the multiview image acquisition process using the combination of a camera and a rotational wedge prism. A model-driven 3D reconstruction performance analysis method is further proposed by considering systematic and random errors, from which an error suppression strategy based on cross-view constraints is formulated. Moreover, a flexible multiview stereo matching method is developed by automatically calibrating the prism-driven dynamic virtual camera via motion estimation and intrinsic optimization. It has been validated through experiments that the multiview epipolar geometry constructed from automatic calibration is reliable and robust, therefore enhancing the accuracy of stereo matching and the efficiency of object reconstruction.
- Research Article
- 10.1587/elex.22.20250640
- Feb 25, 2026
- IEICE Electronics Express
- Chaoyu Lian + 1 more
Multi-view stereo (MVS) 3D reconstruction is a fundamental task in computer vision, aiming to recover accurate scene geometry from multi-view images. However, existing methods continue to confront formidable challenges when handling complex scenes. To effectively address the above issues, this paper introduces an improved multi-view stereo-matching framework, AC-GoMVS. To mitigate the susceptibility of standard convolutions to depth noise at occlusion boundaries, we introduce an Adaptive Geometry-aware 3D Convolution (Agp-Conv3D) that exploits a dual-stream architecture comprising a principal pathway and a residual pathway. Furthermore, a dynamic attention mechanism is incorporated into the main path to adaptively adjust sampling positions of depth hypotheses, significantly improving edge detail reconstruction. In addition, a channel attention mechanism is embedded within the geometry consistency aggregation module to dynamically recalibrate the weight distribution of multi-level geometric features, addressing the feature mismatch issue caused by fixed-weight kernels in traditional 3D convolutions. Simultaneously, skip connections and residual links interlaced between the downsampling and upsampling pathways not only preserve rich fine-grained information, but also markedly enhance feature diversity. We evaluated our method on the DTU dataset and the Tanks and Temples benchmark. Experimental results show that, compared with the baseline model, the proposed approach achieves better reconstruction quality and stronger generalization ability.
- Research Article
- 10.1080/01431161.2026.2631696
- Feb 23, 2026
- International Journal of Remote Sensing
- Guido L Bacino + 3 more
ABSTRACT This study aims to evaluate the dynamics of a semi-hanging beach–dune system in an open sandy beach during winter and summer 2021–2022, using UAV-based Structure-from-Motion (SfM) and Multi-View Stereo (MVS) surveys. Morphometric and morphodynamic parameters were extracted from DEM-derived cross-shore profiles and 3D clouds were compared. Oceanographic forcings were analysed to identify storm events and dune impact hours, while sediment volume balance was quantified. Results highlighted the critical role of winter storms (May–September), where five extreme events triggered foredune erosion of 8600 m3, with an average dune toe retreat of 3.5 m. Sediments were redistributed on the beach, accumulating 4100 m3, favouring a storm berms and cusp formations, completely eroded in summer. Over the annual period, the foredune was unable to recover from the storm-induced sediment deficit. Post-storm alongshore variability was mainly controlled by antecedent beach and dune slopes, dune toe height, and the cumulative duration of impact hours. Overall, the results indicate a clear sedimentary imbalance in the system, characterized by high vulnerability to severe events and low resilience. This study contributes new insight into the morphodynamic behaviour and resilience of semi-hanging beach–dune on the Buenos Aires coast and demonstrates the effectiveness of UAV-based SfM-MVS monitoring in capturing coastal morphodynamics.
- Research Article
- 10.3390/app16042133
- Feb 22, 2026
- Applied Sciences
- Se-Yun Hwang + 4 more
This study presents a real-time framework for generating two-dimensional (2D) orthomosaic maps directly from UAV video. The method targets operational scenarios in which a continuously updated 2D overview is required during flight or immediately after landing, without relying on time-consuming offline photogrammetry workflows such as structure-from-motion (SfM) and multi-view stereo (MVS). The proposed procedure incrementally registers sparsely sampled video frames on standard CPU hardware using classical feature-based image registration. Each selected frame is converted to grayscale and processed under a fixed keypoint budget to maintain predictable runtime. Tentative correspondences are obtained through descriptor matching with ratio-test filtering, and outliers are removed using random sample consensus (RANSAC) to ensure geometric consistency. Inter-frame motion is modeled by a planar homography, enabling the mapping process to jointly account for rotation, scale variation, skew, and translation that commonly occur in UAV video due to yaw maneuvers, mild altitude variation, and platform motion. Sequential homographies are accumulated to warp incoming frames into a global mosaic canvas, which is updated incrementally using lightweight blending suitable for real-time visualization. Experimental results on three UAV video sequences with different durations, flight patterns, and scene targets report representative orthomosaic-style outputs and per-step CPU runtime statistics (mean, 95th percentile, and maximum), illustrating typical operating behavior under the tested settings. The framework produces visually coherent orthomosaic-style maps in real time for approximately planar scenes with sufficient overlap and texture, while clarifying practical failure modes under weak texture, motion blur, and strong parallax. Limitations include potential drift over long sequences and the absence of ground-truth references for absolute registration-error evaluation.
- Research Article
- Feb 18, 2026
- Beijing da xue xue bao. Yi xue ban = Journal of Peking University. Health sciences
- Y Yang + 10 more
To explore the methodology and feasibility of reconstructing soft tissue morphology for fixed implant rehabilitation in edentulous patients using multi-view stereo vision technology, and to conduct a preliminary evaluation of the method's in vitro accuracy. A pair of edentulous resin models were designed and printed, with 6 implant analogs placed in the maxilla and 4 in the mandible. The experimental group (n=10) utilized a self-developed photogrammetric quad-camera system and the automated reconstruction software RealityScan 2.0.1. Self-developed scan bodies were attached to the analogs, and the handheld camera system was used to capture images of the models in vitro. The images were imported into the software to reconstruct the 3D models, and the data were exported as ".stl" files. The control group (n=10) used an intraoral scanner. Scan caps were attached to the analogs, and the models were scanned to generate ".stl" data. Reference data were obtained by scanning the maxillary and mandibular resin models once each with a desktop scanner (EX-PRO). All data were imported into Geomagic Wrap 2021. The root mean square (RMS) was calculated by comparing the 3D morphology of the experimental and control group data against the reference data to represent the magnitude of the 3D morphological deviation and evaluate accuracy. The evaluation was conducted in 4 specific regions: the alveolar ridge, peri-implant soft tissue, buccal, and lingual areas. In the maxilla, the RMS of the experimental group was significantly higher than the control group in the alveolar ridge [(124.89±21.30) μm vs. (53.90±8.93) μm, P < 0.001], peri-implant soft tissue [(157.74±19.13) μm vs. (67.03±3.94) μm, P < 0.001], and lingual areas [(146.01±33.87) μm vs. (46.20±11.19) μm, P < 0.001]. The RMS in the buccal area was lower for the experimental group than the control group [(50.56±8.34) μm vs. (53.83±12.66) μm], but the difference was not statistically significant (P=0.571). In the mandible, the RMS of the experimental group was significantly higher than the control group in the alveolar ridge [(254.04±88.42) μm vs. (58.28±38.96) μm, P < 0.001], peri-implant soft tissue [(165.18±21.30) μm vs. (70.48±28.20) μm, P < 0.001], and lingual areas [(421.75±59.51) μm vs. (54.59±36.77) μm, P < 0.001]. When comparing the buccal and lingual sides, the lingual RMS was significantly higher than the buccal RMS for the experimental group in both the maxilla (P < 0.001) and mandible (P < 0.001). For the control group, the maxillary lingual RMS was significantly lower than the buccal RMS (P < 0.05), while the mandibular lingual RMS was higher than the buccal, but the difference was not statistically significant (P=0.378). The self-developed quad-camera system, combined with multi-view stereo vision reconstruction software, can successfully record the 3D morphology of soft tissue. This study provides a research foundation for the development of extraoral photogrammetric devices capable of simultaneously determining the spatial positions of multiple implant units and acquiring soft tissue morphology.
- Research Article
- 10.3390/s26041251
- Feb 14, 2026
- Sensors (Basel, Switzerland)
- Lander De Waele + 2 more
Accurate modeling of residual limb geometry is essential for prosthetic socket design, yet current scanning techniques can be costly, operator-dependent, or impractical for repeated clinical use. This study presents a fully automated, low-cost photogrammetry workflow capable of generating metrically accurate 3D models of lower-limb residual limbs using video and still images acquired with a standard smartphone or a full-frame digital camera. The pipeline integrates adaptive frame selection, deep learning-based background removal, robust metric scaling via ArUco markers, and open-source Structure-from-Motion and Multi-View Stereo reconstruction, requiring no manual post-processing or proprietary software. Accuracy and repeatability were evaluated using four 3D-printed limb phantoms and high-resolution CT-derived meshes as ground truth. Smartphone video and full-frame camera acquisitions achieved sub-millimeter surface accuracy, volume and perimeter errors within ±1%, and high inter-session repeatability, all within clinically accepted thresholds for prosthetic socket fabrication. In contrast, smartphone still-photo reconstructions showed larger deviations and reduced stability. Acquisition time was under five minutes, and complete reconstruction required approximately 1 h and 30 min. These results demonstrate that smartphone video-based photogrammetry provides a practical, scalable, and clinically viable alternative for residual limb modeling, particularly in resource-constrained or remote care settings.
- Research Article
- 10.5194/isprs-archives-xlviii-2-w12-2026-383-2026
- Feb 12, 2026
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Mattia Previtali + 2 more
Abstract. Orthorectified façade imagery and metric 3D models support architectural documentation, conservation, and further analysis. Orthophoto production from standard Structure-from-Motion and Multi-View Stereo (SfM–MVS) pipelines performs well on broad opaque surfaces, but can be challenging on thin, repetitive, and partly transparent elements (e.g., railings, balusters, grilles). Indeed, in the latter cases depth estimation becomes unstable, filtering removes some structures, and meshing priors thicken elements or bridge voids, compromising both geometry generation and orthophoto production. This paper evaluates 3D Gaussian Splatting (3DGS) as a surface-free alternative for metric façade representation and orthophoto generation in such conditions. We propose a compact façade-plane alignment and scale control procedure to render orthographic products. On a real façade dataset acquired under diffuse illumination, we compare a standard SfM–MVS true-orthophoto baseline with three 3DGS workflows: PostShot training with a custom orthographic renderer, Tortho-Gaussian for optimization and orthographic rendering, and Blender rendering of PostShot splats. Quality is assessed, using a laser scanning acquisition as a benchmark, via completeness, edge fidelity and topological preservation. Results indicate that 3DGS better preserves the topological pattern in railing regions, keeping members separated and apertures open, and enables rapid orthographic rendering once trained. SfM–MVS shows better results on large, well-textured wall areas, whereas 3DGS may introduce mild edge softening or halos at high-contrast boundaries.
- Research Article
- 10.5194/isprs-archives-xlviii-2-w12-2026-263-2026
- Feb 12, 2026
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Adrian Macek + 5 more
Abstract. This study presents a methodology for dense point cloud fusion based on Multi-Criteria Decision-Making (MCDM) techniques, applied to heritage documentation. Photogrammetric reconstruction was conducted using both classical Multi-View Stereo (MVS) algorithms, including Agisoft Metashape, RealityScan, and OpenMVS, as well as a learning-based method (VIS-MVS Net). UAV imagery of historical buildings from the Museum of the Kielce Countryside served as input data, while terrestrial laser scanning (TLS) provided reference datasets. Point cloud quality was evaluated based on completeness, density, and geometric accuracy, with additional metrics assessing surface roughness, planarity, and variance. The proposed fusion approach employed the CRITIC method to assign objective weights to geometric descriptors and used TOPSIS and OWA algorithms to compute point quality scores and merge multiple datasets. The MCDM-based fusion method effectively integrated point clouds of varying origins, preserving structural fidelity and surface smoothness while compensating for missing data. The developed methodology provides a systematic and objective framework for integrating multi-source point clouds, supporting advanced heritage documentation and metrological applications.
- Research Article
- 10.5194/isprs-archives-xlviii-2-w12-2026-143-2026
- Feb 12, 2026
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Widiatmoko A Fadilah + 3 more
Abstract. Urban heritage increasingly faces threats from natural disasters and anthropogenic activities, demanding preservation techniques that balance affordability, speed, and accuracy. Real-time rendering with 3D Gaussian Splatting (3DGS) offers immense potential for rapid heritage documentation; however optimal training configurations especially for aerial nadiral images remain limited. This study compares three 3DGS implementations: native 3DGS, Postshot Splat3, and MCMC variants against conventional multi-view stereo using an 80-meter altitude drone aerial imagery. Results reveal some interesting key findings where moderate downsampling of 1/8 resolution can substantially outperform full resolution training. Postshot Splat3 achieved optimal photometric quality while training 21x faster than native implementations. However, geometric analysis showed that rendering quality and geometric accuracy are not correlated. Furthermore, the result suggested that MCMC configurations failed to compete with native 3DGS and Postshot Splat3. These findings suggest that 3DGS-based heritage documentation requires a rather conservative resolution selection and optimized implementation for efficiency to allow cost effective preservation while maintaining dimensional fidelity.
- Research Article
- 10.3390/plants15040525
- Feb 7, 2026
- Plants (Basel, Switzerland)
- Xingmei Xu + 9 more
Forest ecosystems play a pivotal role in maintaining the balance of the global carbon cycle and conserving biodiversity. High-density point clouds derived from unmanned aerial vehicle (UAV) structure from motion (SfM) and multi-view stereo (MVS) technologies offer a cost-effective solution for data acquisition. These technologies have become efficient tools for facilitating precision forest resource management and extracting individual tree structural parameters. However, in complex forest scenarios during the leaf-off season, canopies exhibit unstructured branch network morphologies due to the absence of leaf occlusion, and adjacent crowns are heavily interlaced. Consequently, existing segmentation methods struggle to overcome challenges associated with fuzzy boundaries and instance adhesion. To address these challenges, this study proposes TreeSeg-Net, an end-to-end instance segmentation network designed to precisely separate individual trees directly from raw point clouds. The network incorporates a global context attention module (GCAM) to capture long-range feature dependencies, thereby compensating for the limitations of sparse convolution in perceiving global information. Simultaneously, a spatial proximity weighting module (SPWM) is designed. By introducing geometric center constraints and a distance penalty mechanism, this module effectively mitigates under-segmentation issues caused by the feature similarity of adjacent branches in high-canopy-density environments. Experimental results demonstrate that TreeSeg-Net achieves an average precision (AP) of 97.2% in instance segmentation tasks and a mean intersection over union (mIoU) of 99.7% in semantic segmentation tasks. Compared to mainstream networks, the proposed method exhibits superior segmentation accuracy, providing an efficient and automated technical solution for precise resource inventory in complex forest environments.
- Research Article
- 10.1186/s13007-026-01505-w
- Feb 7, 2026
- Plant methods
- Xulong Huang + 7 more
Precise, non-destructive phenotyping of saffron during vegetative growth is critical for optimizing corm yield and accelerating breeding programs, yet systematic 3D measurements have remained elusive due to extreme morphological challenges: ultra-narrow leaves, severe mutual occlusion, and prostrate growth architecture. Traditional single-view imaging systems fail to resolve individual leaves under such conditions, limiting phenotypic analysis to whole-canopy descriptors. Here, we developed a specialized organ-level 3D phenotyping workflow specifically designed for narrow, overlapping leaves using a low-cost dual-camera rotary acquisition system integrated with open-source Structure-from-Motion Multi-View Stereo (SfM-MVS) reconstruction. The dual-perspective strategy reduces occlusion-induced errors by 75% compared to single-view approaches, enabling robust organ-level segmentation via a multi-constraint clustering strategy. Automated measurements of leaf length and width across five developmental stages demonstrate exceptional agreement with manual references (R2 > 0.94, MAPE < 6%), achieving accuracy benchmarks established for broad-leaved crops using commercial-grade hardware at 100 × lower cost. Systematic voxel sensitivity analysis across nine scales identified optimal preprocessing parameters (2cm voxel size) balancing measurement precision with computational efficiency, addressing a critical reproducibility gap in plant phenotyping. Exploratory longitudinal tracking revealed that above-ground biomass was correlated with final corm yield (r = 0.68, P < 0.001), with mid-vegetative canopy volume also showing strong correlation (r = 0.52, P < 0.01), suggesting potential resource allocation trade-offs between vegetative expansion and storage organ development. This work demonstrates that organ-level 3D phenotyping of narrow, overlapping leaves is achievable using low-cost imaging hardware and transparent methodological workflows. Complete documentation of algorithmic parameters and hardware specifications enables direct replication and adaptation to other narrow-leaved crops (wheat, rice, onion, leek), democratizing access to high-throughput phenotyping in resource-limited settings. The workflow advances plant phenomics by demonstrating that methodological transparency and cost-effectiveness need not compromise measurement precision, opening new avenues for phenotype-to-genotype mapping and predictive breeding in underutilized crops.