This research delves into the application of machine learning methods for predicting the yield of potato varieties used for French fries in Poland. By integrating a comprehensive dataset comprising agronomical, climatic, soil, and satellite-based vegetation data from 36 commercial potato fields over five growing seasons (2018–2022), we developed three distinct models: non-satellite, satellite, and hybrid. The non-satellite model, relying on 85 features, excludes vegetation indices, whereas the satellite model includes these indices within its 128 features. The hybrid model, combining all available features, encompasses a total of 165 features, presenting the most-comprehensive approach. Our findings revealed that the hybrid model, particularly when enhanced with SVM outlier detection, exhibited superior performance with the lowest Mean Absolute Percentage Error (MAPE) of 5.85%, underscoring the effectiveness of integrating diverse data sources into agricultural yield prediction. In contrast, the non-satellite and satellite models displayed higher MAPE values, indicating less accuracy compared to the hybrid model. Advanced data-processing techniques such as PCA and outlier detection methods (LOF and One-Class SVM) played a pivotal role in model performance, optimising feature selection and dataset refinement. The study concluded that machine learning methods, particularly when leveraging a multifaceted approach involving a wide array of data sources and advanced processing techniques, can significantly enhance the accuracy of agricultural yield predictions. These insights pave the way for more-efficient and -informed agricultural practices, emphasising the potential of machine learning in revolutionising yield prediction and crop management.
Read full abstract