We comprehensively examine methodologies tailored for subnational crop yield and production forecasting by integrating Earth Observation (EO) datasets and advanced machine learning approaches. We scrutinized diverse input data types, cross-validation methods, and training durations, focusing on maize production and yield predictions in Burkina Faso and Somalia. Central to our analysis is the comparative assessment of using time-invariant features within a panel data (PD) model versus a time-series data (TD) model. The TD model performed well in predicting both production and yield, while the PD model offered comparable yield predictions. Time-invariant features such as livelihood zones, soil properties, and cropland extents enriched the spatial understanding of crop data, enhancing the R-squared by 0.09 (0.21) for production and 0.11 (0.03) for yield, with corresponding reductions in the Mean Absolute Percentage Error by 90 % (238 %) for production and 5 % (4 %) for yield in Burkina Faso (Somalia). While Burkina Faso's consistent crop data allowed for effective modeling with brief training, Somalia benefited from the adaptability of the PD model to crop statistics outliers, particularly with extended training in high-producing regions. The PD approach showed promise in addressing data gaps, although predicting crop productions for unobserved districts remained a challenge. Our findings highlight the harmonious integration of EO data and machine learning in the field of agricultural forecasting and emphasize the importance of region-specific methodologies, especially in the rapidly changing landscape of EO data convergence.
Read full abstract