Articles published on Fusion Of Depth Images
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
6 Search results
Sort by Recency
- Research Article
- 10.3390/agriculture16060704
- Mar 21, 2026
- Agriculture
- Yang Guo + 7 more
Real-time measurement of tea bud phenotypes via mobile devices is constrained by model lightweighting challenges, and research on non-contact measurement of tea bud phenotypes based on key points remains largely unexplored. Information on the growth posture of tea buds is an important basis for determining tea maturity grades, quality monitoring, and tea breeding. Therefore, this work develops a deep learning-enabled YOLOv8p-Tea model to estimate key point information of tea bud posture and automatically obtain three-dimensional point cloud information of tea buds by integrating depth information, thereby achieving in situ measurement of tea bud phenotypic parameters. Meanwhile, the model is trained and validated using a tea bud (one-bud-three-leaf) image dataset, and its effectiveness is demonstrated through experiments. Compared to the YOLOv8p-pose model, the model achieves a mAP50 of 98.3%, a P of 97%, and parameters of 0.72 M, with mAP50 and P improved by 1.5% and 1.9%, respectively, and the parameter count is reduced by 25%. To validate the accuracy of phenotypic extraction, the model was deployed on edge devices, and 30 tea buds with one bud and three leaves were randomly selected in a tea garden. The final in situ measurement results showed an MRE of 6.63%. Experimental findings indicate that the developed method is capable of not only effectively estimate tea bud posture but also accurately achieves in situ measurement of tea bud phenotypes, which holds potential applications for meeting the construction needs of smart tea gardens and optimizing tea breeding.
- Research Article
5
- 10.1109/tnnls.2024.3352974
- Feb 1, 2025
- IEEE transactions on neural networks and learning systems
- Yingkui Zhang + 6 more
The widely deployed ways to capture a set of unorganized points, e.g., merged laser scans, fusion of depth images, and structure-from- , usually yield a 3-D noisy point cloud. Accurate normal estimation for the noisy point cloud makes a crucial contribution to the success of various applications. However, the existing normal estimation wisdoms strive to meet a conflicting goal of simultaneously performing normal filtering and preserving surface features, which inevitably leads to inaccurate estimation results. We propose a normal estimation neural network (Norest-Net), which regards normal filtering and feature preservation as two separate tasks, so that each one is specialized rather than traded off. For full noise removal, we present a normal filtering network (NF-Net) branch by learning from the noisy height map descriptor (HMD) of each point to the ground-truth (GT) point normal; for surface feature recovery, we construct a normal refinement network (NR-Net) branch by learning from the bilaterally defiltered point normal descriptor (B-DPND) to the GT point normal. Moreover, NR-Net is detachable to be incorporated into the existing normal estimation methods to boost their performances. Norest-Net shows clear improvements over the state of the arts in both feature preservation and noise robustness on synthetic and real-world captured point clouds.
- Research Article
3
- 10.1016/j.biosystemseng.2024.05.005
- May 20, 2024
- Biosystems Engineering
- Chunming Wen + 9 more
Height estimation of sugarcane tip cutting position based on multimodal alignment and depth image fusion
- Research Article
21
- 10.1109/access.2020.2973003
- Jan 1, 2020
- IEEE Access
- Yanjun Peng + 6 more
Unstructured point clouds are a representative shape representation of real-world scenes in 3D vision and graphics. Incompletion inevitably arises, due to the way the set of unorganized points is captured, e.g., as fusion of depth images, merged laser scans, or structure-from-x. In this paper, an end-to-end sparse-to-dense multi-encoder neural network (termed an SDME-Net) is proposed for uniformly completing an unstructured point cloud with its shape details preserved. Unlike most existing learning-based shape completion methods that are enforced on the representations of 2D images and 3D voxelization of point clouds, and require priors of the underlying shape's structures, topologies and annotations, the SDME-Net is implemented on the incomplete and even noisy point cloud without any transformation, and makes no specific assumptions about the incompletion distribution and geometry features in the input. Specifically, the defective point cloud is completed and optimized in a sparse-to-dense manner of two-stages. In the first stage, we generate a sparse but complete point cloud based on a bistratal PointNet, and in the second stage, we yield a dense and high-fidelity point cloud by encoding and decoding the sparse result in the first stage using PointNet++. Meanwhile, we combine the distance loss and repulsion loss to generate more uniformly distributed output point clouds closer to the ground-truth counterparts. Qualitative and quantitative experiments on the public ShapeNet dataset illustrate that our approach outperforms the state-of-art learning-based point cloud shape completion methods in terms of real structure recovery, uniformity, and noise/partiality robustness.
- Research Article
- 10.1504/ijwmc.2020.10030316
- Jan 1, 2020
- International Journal of Wireless and Mobile Computing
- Du Jiang + 4 more
Gesture recognition is a key research field in the human-computer interaction. At present, most of researchers focus on one-handed gesture recognition, but do not pay much attention to bimanual (two hands) gesture recognition. This paper presents a deep learning-based solution to tackle the self-occlusion and self-similarity. To solve this problem, this paper uses Kinect to collect many colour and depth images of different gestures, and each gesture contains multiple sample individuals. Colour images and depth images are used to train the recognition model of bimanual gesture respectively, and then the colour image and depth image are fused, and the bimanual gesture recognition model is trained based on colour image and depth image fusion. Then, the bimanual recognition effects of the three models are compared. The experimental results show that, regardless of the single gesture precision or the mean average precision, the bimanual gesture recognition effect of the fused model is better than the gesture recognition models based on either colour image or depth image.
- Research Article
2
- 10.3724/sp.j.1089.2018.16771
- Jan 1, 2018
- Journal of Computer-Aided Design & Computer Graphics
- Yan Xu + 5 more
This paper presents a new method for human action recognition fusing depth images and skeletal maps. Each depth image is represented by 2D and 3D auto-correlation of gradients features. A feature using spatial and orientational auto-correlation is extracted from depth images. Mutual information is used to define the similarity of each frame in the skeleton sequence, and then extract the key frames from the skeleton sequence. The skeleton feature extracted from the key frames as complementary features to cope with the temporal information loss in depth images. Each set of feature is used as input to two extreme learning machine classifiers and assign different weight to each set of feature. Using different classifier weights provides more flexible to different features. The final class label is determined according to the fused result. Experiments conducted on MSR_Action3D depth action data set show the accuracy of this proposed method is 1.5% higher than the state-of-the-art action recognition methods.