Abstract

ABSTRACT In logistics networks, empty container congestion and scarcity often stem from trade imbalance and supply-demand mismatch. This paper focuses on the problem of empty container repositioning in maritime logistics and proposes a reinforcement learning framework that integrates a self-adaptive mechanism for adjusting the weights of a multi-objective reward function. The objective is to enhance container utilization and reduce scheduling costs. By reviewing the development of the empty container repositioning problem and analysing the advantages of using reinforcement learning to address the temporal and spatial complexity, the problem is modelled as a Markov decision process and tackled using reinforcement learning techniques. To achieve the optimization objectives, which involve reducing resource shortages at various locations and minimizing resource repositioning costs, a multi-objective reward function is introduced to capture the mutually constrained preferences. The weights of the reward function are dynamically adjusted to account for the potential time-varying preferences of the agent, mitigating the issue of poor generalization performance associated with fixed-weight reward functions. Comparative experimental analysis against conventional reinforcement learning algorithms demonstrates the superior performance of the proposed approach in problem solving. Based on the results and practical requirements of the case study, relevant recommendations for addressing empty container repositioning are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call