End-to-end deep learning for directly estimating grape yield from ground-based imagery

Alexander G Olenskyj,Brent S Sams,Zhenghao Fei,Vishal Singh,Pranav V Raja,Gail M Bornhorst,J Mason Earles

doi:10.1016/j.compag.2022.107081

Alexander G Olenskyj, Brent S Sams + Show 5 more

Open Access

PDF Available

https://doi.org/10.1016/j.compag.2022.107081

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Yield estimation prior to harvest is a powerful tool in vineyard management, as it allows growers to fine-tune management practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the applicability of nondestructive proximal imaging combined with deep learning for yield estimation in vineyards. Continuous image data collection using a vehicle-mounted sensing kit combined with collection of ground truth yield data at harvest using a commercial yield monitor allowed for the generation of a large dataset of 23,581 yield points and 107,933 images. Moreover, this study was conducted in a commercial vineyard which was mechanically managed, representing a challenging environment for image analysis but a common set of conditions in the California Central Valley. Three model architectures were tested: object detection, CNN regression, and transformer models. The object detection model was trained on hand-labeled images to localize grape bunches, and detections were either counted or their pixel count was summed to obtain a metric which was correlated to grape yield. Conversely, regression models were trained end-to-end to directly predict grape yield from image data without the need for hand labeling. Results demonstrated that both a transformer model as well as the object detection model with pixel area processing performed comparably, with a mean absolute percent error of 18% and 18.5%, respectively on a representative holdout dataset. Saliency mapping was used to demonstrate the attention of the CNN regression model was localized near the predicted location of grape bunches, as well as on the top of the grapevine canopy. Overall, the study demonstrated the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale. Additionally, the end-to-end modeling approach was able to perform comparably to the object detection approach while eliminating the need for hand-labeling.

Full Text