Prediction of Blueberry (Vaccinium corymbosum L.) Yield Based on Artificial Intelligence Methods

Gniewko Niedbała,Tomasz Wojciechowski,Bartosz Świderski,Izabella Antoniuk,Jarosław Kurek,Krzysztof Bobran

doi:10.3390/agriculture12122089

Gniewko Niedbała, Tomasz Wojciechowski + Show 4 more

Open Access

https://doi.org/10.3390/agriculture12122089

Copy DOI

Abstract

In this paper, we present a high-accuracy model for blueberry yield prediction, trained using structurally innovative data sets. Blueberries are blooming plants, valued for their antioxidant and anti-inflammatory properties. Yield on the plantations depends on several factors, both internal and external. Predicting the accurate amount of harvest is an important aspect in work planning and storage space selection. Machine learning algorithms are commonly used in such prediction tasks, since they are capable of finding correlations between various factors at play. Overall data were collected from years 2016–2021, and included agronomic, climatic and soil data as well satellite-imaging vegetation data. Additionally, growing periods according to BBCH scale and aggregates were taken into account. After extensive data preprocessing and obtaining cumulative features, a total of 11 models were trained and evaluated. Chosen classifiers were selected from state-of-the-art methods in similar applications. To evaluate the results, Mean Absolute Percentage Error was chosen. It is superior to alternatives, since it takes into account absolute values, negating the risk that opposite variables will cancel out, while the final result outlines percentage difference between the actual value and prediction. Regarding the research presented, the best performing solution proved to be Extreme Gradient Boosting algorithm, with MAPE value equal to 12.48%. This result meets the requirements of practical applications, with sufficient accuracy to improve the overall yield management process. Due to the nature of machine learning methodology, the presented solution can be further improved with annually collected data.

Full Text