This study explores the application of multi-armed bandit (MAB) algorithms in dynamic pricing of crops, with a focus on evaluating the adaptive upper confidence bound (asUCB) and Thompson Sampling (TS) algorithms. Through simulation experiments on historical data, the study analyzed the performance of these algorithms in fitting actual market price trends and responding to future price fluctuations. The results indicate that the asUCB algorithm performed particularly well in both the training dataset and simulation tests, demonstrating low mean squared error (MSE) and minimal cumulative regret, reflecting its rapid convergence and stable pricing capabilities. In contrast, although the TS algorithm initially performed slightly less effectively, it demonstrated unique advantages in dealing with market volatility due to its strong adaptability. This study demonstrates the potential application of MAB algorithms in dynamic pricing, providing valuable insights for pricing strategies in the agricultural product market.
Read full abstract