RGB-D Images Research Articles

In the process of farm automation, fruit detection is the basis and guarantee for yield prediction, automatic picking, and other orchard operations. RGB images can only obtain the two-dimensional information of the scene, which is not sufficient to effectively distinguish fruits that are dense growth and occlusion by branches and leaves. With the development of depth sensors, using RGB-D images with more complementary information can boost the performance of fruit detection. However, due to the nature of sensors and scene configurations, the quality of outdoor depth images is poor, posing a challenge when fusing RGB-D features. Therefore, this paper proposes an end-to-end RGB-D object detection network, termed as noise-tolerant feature fusion network (NT-FFN), to utilize the outdoor multi-modal data properly and improve the detection accuracy. Specifically, the NT-FFN first uses two structurally identical feature extractors to extract single-modal (color and depth) features, which is the base of the subsequent feature fusion. Then, to avoid introducing too much depth noise and focus the perception on the important part of the features, an attention-based fusion module is designed to adaptively fuse the multi-modal features. Finally, multi-scale features from the color images and the fusion modules are used to predict object position, which not only improves the network's ability to detect multi-scale objects but also further enhances the noise immunity of the network. In addition, this paper constructs an RGB-D citrus fruit dataset, which contributes to comprehensively evaluating the proposed network. Evaluation metrics on the dataset show that the NT-FFN achieves an AP50 of 95.4% with a real-time speed, which outperforms single-modal methods, common multi-modal fusion strategies, and advanced multi-modal detection methods. The proposed NT-FFN also achieves excellent detection results in other fruit detection tasks, which verifies its generalization ability. This study provides the possibility and foundation for performing multi-modal information fusion in outdoor fruit detection.

Read full abstract

Non-destructive picking of fresh tomatoes is a delicate agronomical operation, based on comprehensive information about the plant organ, such as the location of stem, peduncle, and fruits. The matching between visual information supply and information demand from the agronomical technic is the key power to promote the picking robot from the laboratory to the field. The three-dimensional pose information, containing the location of each organ of the plant, can meet the demand of agronomical technic. It is the premise of precisely handling the cluster of fruits. In order to realize the fine tomato bunch harvesting operation in a bunch, this paper proposed a three-dimensional pose detection method for tomato bunch. The method, named Tomato Pose Method (TPM), is composed of a priori geometric model, a cascaded multi-task network, and a three-dimensional reconstruction process. Based on prior knowledge and agronomic technology, this prior geometric model comprehensively and flexibly describes the spatial location information of tomato bunch. The cascaded multi-task network is designed based on hourglass structure and transfer learning, which is suitable for bounding box and key point prediction of tomato bunches in complex environments. Finally, combining the prior geometric model and the spatial position information of each key point, the tomato bunch is reconstructed. Only a medium training dataset, containing 1800 RGBD images covering changing lighting, occlusion, and various poses, is needed for training. Its success rate of TPM on two-dimensional keypoint detection is 94.02%, the accuracy of 85.77% predicted points are at medium level. And 70.05% tomato bunch with multi-pose can be constructed. More importantly, this method only needs one RGBD image taken by a commercial camera to realize the three-dimensional reconstruction of a single-bunch scenario in 1.0 s, and a multi-bunch scenario in 2.0 s. It provides comprehensive information, and provides data basis for target positioning and path planning of picking robot, which makes the non-destructive harvesting possible.

Read full abstract

RGB-D Images Research Articles

Related Topics

Articles published on RGB-D Images

Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning

TMFNet: Three-Input Multilevel Fusion Network for Detecting Salient Objects in RGB-D Images

MONOCULAR DEPTH ESTIMATION IN FOREST ENVIRONMENTS

Automatic Weight Prediction System for Korean Cattle Using Bayesian Ridge Algorithm on RGB-D Image

Noise-tolerant RGB-D feature fusion network for outdoor fruit detection

Multi-View Visual Relationship Detection with Estimated Depth Map

Robust 3D face modeling and tracking from RGB-D images

Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors

RLLNet: a lightweight remaking learning network for saliency redetection on RGB-D images

Bifurcation Fusion Network for RGB-D Salient Object Detection

Scale-aware network with modality-awareness for RGB-D indoor semantic segmentation

Efficient 6D object pose estimation based on attentive multi‐scale contextual information

Three-dimensional pose detection method based on keypoints detection network for tomato bunch

SE-ResUNet: A Novel Robotic Grasp Detection Method

Robotic Cable Routing with Spatial Representation

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features

Learning geodesic-aware local features from RGB-D images

Autonomous, Mobile Manipulation in a Wall-building Scenario: Team LARICS at MBZIRC 2020

An Image Synthesis Method Generating Underwater Images

A novel fuzzy clustering based method for image segmentation in RGB-D images

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

RGB-D Images Research Articles

Related Topics

Articles published on RGB-D Images

Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning

TMFNet: Three-Input Multilevel Fusion Network for Detecting Salient Objects in RGB-D Images

MONOCULAR DEPTH ESTIMATION IN FOREST ENVIRONMENTS

Automatic Weight Prediction System for Korean Cattle Using Bayesian Ridge Algorithm on RGB-D Image

Noise-tolerant RGB-D feature fusion network for outdoor fruit detection

Multi-View Visual Relationship Detection with Estimated Depth Map

Robust 3D face modeling and tracking from RGB-D images

Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors

RLLNet: a lightweight remaking learning network for saliency redetection on RGB-D images

Bifurcation Fusion Network for RGB-D Salient Object Detection

Scale-aware network with modality-awareness for RGB-D indoor semantic segmentation

Efficient 6D object pose estimation based on attentive multi‐scale contextual information

Three-dimensional pose detection method based on keypoints detection network for tomato bunch

SE-ResUNet: A Novel Robotic Grasp Detection Method

Robotic Cable Routing with Spatial Representation

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features

Learning geodesic-aware local features from RGB-D images

Autonomous, Mobile Manipulation in a Wall-building Scenario: Team LARICS at MBZIRC 2020

An Image Synthesis Method Generating Underwater Images

A novel fuzzy clustering based method for image segmentation in RGB-D images