Deep neural networks for grape bunch segmentation in natural images from a consumer-grade camera

R Marani,G Reina,A Milella,A Petitti

doi:10.1007/s11119-020-09736-0

R Marani, G Reina + Show 2 more

Open Access

https://doi.org/10.1007/s11119-020-09736-0

Copy DOI

Abstract

Precision agriculture relies on the availability of accurate knowledge of crop phenotypic traits at the sub-field level. While visual inspection by human experts has been traditionally adopted for phenotyping estimations, sensors mounted on field vehicles are becoming valuable tools to increase accuracy on a narrower scale and reduce execution time and labor costs, as well. In this respect, automated processing of sensor data for accurate and reliable fruit detection and characterization is a major research challenge, especially when data consist of low-quality natural images. This paper investigates the use of deep learning frameworks for automated segmentation of grape bunches in color images from a consumer-grade RGB-D camera, placed on-board an agricultural vehicle. A comparative study, based on the estimation of two image segmentation metrics, i.e. the segmentation accuracy and the well-known Intersection over Union (IoU), is presented to estimate the performance of four pre-trained network architectures, namely the AlexNet, the GoogLeNet, the VGG16, and the VGG19. Furthermore, a novel strategy aimed at improving the segmentation of bunch pixels is proposed. It is based on an optimal threshold selection of the bunch probability maps, as an alternative to the conventional minimization of cross-entropy loss of mutually exclusive classes. Results obtained in field tests show that the proposed strategy improves the mean segmentation accuracy of the four deep neural networks in a range between 2.10 and 8.04%. Besides, the comparative study of the four networks demonstrates that the best performance is achieved by the VGG19, which reaches a mean segmentation accuracy on the bunch class of 80.58%, with IoU values for the bunch class of 45.64%.

Highlights

The accurate knowledge of the crop characteristics at the sub-field level is crucial in precision agriculture
The cutting-edge approach of deep learning has served to the processing of low-quality images with VGA resolution (640 × 480 pixels), acquired by consumer-grade hardware mounted on-board an agricultural vehicle which moves in real environments
It selects the optimal threshold levels that can lead to the best Intersection over Union (IoU) of the bunch class at the training stage

Summary

Introduction

The accurate knowledge of the crop characteristics at the sub-field level is crucial in precision agriculture. Conventional approaches are based on human interventions, i.e. few analysis campaigns during which agronomists perform visual and even destructive inspections of each row of the vineyard This process is highly time-consuming, subjective, and prone to human errors. The momentum is always set to 0.9 This process tries to converge to an optimal solution, at minimum cost (or loss), by heavily changing the weight of the last fully-connected layer (not initialized) and slightly tuning the transferred ones. For this reason, different learning rates are proposed for the transferred layers (lower) and the last fully-connected (much higher). The learning rates of the first layers is set to 1 0–4, whereas the last fully-connected layers of the four networks learn 20 times faster (learning rate of 2 × 10–3)

Methods

Results

Conclusion