Abstract

Even though base-stock policies are per se straightforward, determining them in complex, stochastic multi-echelon supply chains is often cumbersome or even analytically impossible. Therefore, a wide range of heuristics has been proposed for this purpose. This is the first study considering the problem as a multi-armed bandit problem. In this context, we investigate two algorithms: first, we propose an approach that is based on upper confidence bounds and priority queues. This so-called PQ-UCB algorithm allows us to drastically reduce the runtime of upper confidence bound allocation strategies in problems with large action spaces. Subsequently, we apply the parameter-free sequential halving (SH) algorithm. We investigate various scenarios to compare the performance of both algorithms with the performance of a genetic algorithm and a simulated annealing algorithm taken from the literature. PQ-UCB as well as SH outperform both benchmark metaheuristics and require substantially less effort related to parameter tuning (or even no effort in the case of SH). As multi-armed bandits are not common in inventory optimisation so far, we aim to emphasise their strengths and hope to promote their dissemination also in other domains of supply chain management.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call