Grape cluster detection based on spatial-to-depth convolution and attention mechanism

Shuai Rong,Xinghai Kong,Ruibo Gao,Zhiwei Hu,Hua Yang

doi:10.1080/21642583.2023.2295949

Shuai Rong, Xinghai Kong + Show 3 more

Open Access

https://doi.org/10.1080/21642583.2023.2295949

Copy DOI

Abstract

Grape cluster detection is a crucial step in the visual tasks of automated grape harvesting. Background and occlusion lead to difficulty in detecting grape clusters under natural environments. Some improvements have been proposed to solve this issue. Firstly, the public dataset is enriched with the data augmentation methods of random brightness change, image flip left-right, and mosaic to strengthen the model's robustness. Secondly, based on the problem of information loss in grape cluster detection, a plug-and-play module of spatial-to-depth convolution (STD-Conv) is added to enrich grape cluster feature information. The original grape features are further fused by converting the spatial dimension of the input image into a depth dimension. Thirdly, a simple, parameter-free attention mechanism (SimAM) is applied to the backbone to improve the weight of grape targets and suppress background interference weight in feature extraction. Experiments show that combining STD-Conv and SimAM can improve the accuracy of YOLOv4, YOLOv5, and YOLOX. The improved YOLOX model achieves the highest 88.4% mean Average Precision (mAP), 87.8% precision, and 79.5% recall. These findings demonstrate that the enhanced YOLOX model performs well for grape cluster detection. This study's conclusion makes some valuable ideas for automated harvesting into grape or other fruit detection.

Full Text