In this study, the authors try to emphasize how Q-learning, a model-free reinforcement learning (RL) technique can be used for optimizing routing in a grid-based environment. This study aims to assess the efficacy of Q-learning in enhancing routing for agricultural supply chains, investigate its flexibility in dynamic environments, and compare its performance across several real-world scenarios. In this specific case of the banana chain, an agent is moving through various entities in the system - from local growers to small traders and warehouses. It models the routing problem as a Markov Decision Process (MDP) and the goal is to optimize cumulative reward. Several possible cases are simulated, e.g. the finding of an optimal route for a given visit sequence that optimizes charging time and non-drivable paths left over when unexpected blockages occur to avoid energy wear penalties as well as how to best save costs; These results demonstrate the adaptability and durability of Q-learning in dynamic environments to obtain near-optimal solutions across diverse settings. Indeed, the present study adds to a growing body of research on the application of RL in logistics and supply chain management, highlighting its potential to enhance decision-making in complex and variable environments. The findings suggest that Q-learning can effectively balance multiple objectives, such as minimizing distance, reducing costs, and avoiding high-wear areas, making it a valuable tool for optimizing routing in real-world supply chains. Future work will explore broader applications and other RL algorithms in similar contexts.
Read full abstract