Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images.

Ke Xu,Yan Zhu,Jun Ni,Xiaoping Jiang,Weixing Cao,Zhijian Jiang,Shuailong Li

doi:10.3389/fpls.2021.732968

Abstract

Single-modal images carry limited information for features representation, and RGB images fail to detect grass weeds in wheat fields because of their similarity to wheat in shape. We propose a framework based on multi-modal information fusion for accurate detection of weeds in wheat fields in a natural environment, overcoming the limitation of single modality in weeds detection. Firstly, we recode the single-channel depth image into a new three-channel image like the structure of RGB image, which is suitable for feature extraction of convolutional neural network (CNN). Secondly, the multi-scale object detection is realized by fusing the feature maps output by different convolutional layers. The three-channel network structure is designed to take into account the independence of RGB and depth information, respectively, and the complementarity of multi-modal information, and the integrated learning is carried out by weight allocation at the decision level to realize the effective fusion of multi-modal information. The experimental results show that compared with the weed detection method based on RGB image, the accuracy of our method is significantly improved. Experiments with integrated learning shows that mean average precision (mAP) of 36.1% for grass weeds and 42.9% for broad-leaf weeds, and the overall detection precision, as indicated by intersection over ground truth (IoG), is 89.3%, with weights of RGB and depth images at α = 0.4 and β = 0.3. The results suggest that our methods can accurately detect the dominant species of weeds in wheat fields, and that multi-modal fusion can effectively improve object detection performance.

Highlights

Weeds are a major biological problem that limits the yield and quality of wheat by competing for light, water, fertilizer, and space (Munier-Jolain et al, 2013; Fahad et al, 2015)
The depth images had pixels with uniform color in soil and weeds areas, while PHA and RGB images had similar textures, which indicated their similarity. These comparisons indicated that PHA images obtained by recoding depth images were similar to RGB images in terms of information and structure and were more suitable than depth images for convolutional neural network (CNN)-based feature learning
We proposed a three-channel weeds detection method based on multi-modal information by fusing RGB and depth images and applying the concept of multiscale object detection, which effectively improved the precision of weeds detection in wheat fields

Summary

Introduction

Weeds are a major biological problem that limits the yield and quality of wheat by competing for light, water, fertilizer, and space (Munier-Jolain et al, 2013; Fahad et al, 2015). There are both grass and broad-leaf weeds (Gaba et al, 2010). Grass weeds have invaded and dominated wheat fields, and like broad-leaf weeds, they threaten production (Ulber et al, 2009). They diminish wheat grain filling and have a greater impact on growth and yield (Siddiqui et al, 2010). Grass weeds have morphological characteristics and living habits similar to those of wheat, which interfere with their recognition

Results

Discussion

Conclusion