Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands

Nicola Secomandi

doi:10.1016/s0305-0548(99)00146-x

Abstract

The paper considers a version of the vehicle routing problem where customers’ demands are uncertain. The focus is on dynamically routing a single vehicle to serve the demands of a known set of geographically dispersed customers during real-time operations. The goal consists of minimizing the expected distance traveled in order to serve all customers’ demands. Since actual demand is revealed upon arrival of the vehicle at the location of each customer, fully exploiting this feature requires a dynamic approach. This work studies the suitability of the emerging field of neuro-dynamic programming (NDP) in providing approximate solutions to this difficult stochastic combinatorial optimization problem. The paper compares the performance of two NDP algorithms: optimistic approximate policy iteration and a rollout policy. While the former improves the performance of a nearest-neighbor policy by 2.3%, the computational results indicate that the rollout policy generates higher quality solutions. The implication for the practitioner is that the rollout policy is a promising candidate for vehicle routing applications where a dynamic approach is required. Scope and purpose Recent years have seen a growing interest in the development of vehicle routing algorithms to cope with the uncertain and dynamic situations found in real-world applications (see the recent survey paper by Powell et al. [1]). As noted by Psaraftis [2], dramatic advances in information and communication technologies provide new possibilities and opportunities for vehicle routing research and applications. The enhanced capability of capturing the information that becomes available during real-time operations opens up new research directions. This informational availability provides the possibility of developing dynamic routing algorithms that take advantage of the information that is dynamically revealed during operations. Exploiting such information presents a significant challenge to the operations research/management science community. The single vehicle routing problem with stochastic demands [3] provides an example of a simple, yet very difficult to solve exactly, dynamic vehicle routing problem [2, p. 157] . The problem can be formulated as a stochastic shortest path problem [4] characterized by an enormous number of states. Neuro-dynamic programming [5,6] is a recent methodology that can be used to approximately solve very large and complex stochastic decision and control problems. In this spirit, this paper is meant to study the applicability of neuro-dynamic programming algorithms to the single-vehicle routing problem with stochastic demands.

Full Text