Short-cycle agricultural product sales forecasting significantly reduces food waste by accurately predicting demand, ensuring producers match supply with consumer needs. However, the forecasting is often subject to uncertain factors, resulting in highly volatile and discontinuous data. To address this, a hierarchical prediction model that combines RF-XGBoost is proposed in this work. It adopts the Random Forest (RF) in the first layer to extract residuals and achieve initial prediction results based on correlation features from Grey Relation Analysis (GRA). Then, a new feature set based on residual clustering features is generated after the hierarchical clustering is applied to classify the characteristics of the residuals. Subsequently, Extreme Gradient Boosting (XGBoost) acts as the second layer that utilizes those residual clustering features to yield the prediction results. The final prediction is by incorporating the results from the first layer and second layer correspondingly. As for the performance evaluation, using agricultural product sales data from a supermarket in China from 1 July 2020 to 30 June 2023, the results demonstrate superiority over standalone RF and XGBoost, with a Mean Absolute Percentage Error (MAPE) reduction of 10% and 12%, respectively, and a coefficient of determination (R2) increase of 22% and 24%, respectively. Additionally, its generalization is validated across 42 types of agricultural products from six vegetable categories, showing its extensive practical ability. Such performances reveal that the proposed model beneficially enhances the precision of short-term agricultural product sales forecasting, with the advantages of optimizing the supply chain from producers to consumers and minimizing food waste accordingly.
Read full abstract