Efficient proactive emergency vehicle planning is crucial for effective emergency response management systems. This paper develops a flexible deep reinforcement learning framework to address uncertainties in dynamic emergency requests and complex traffic conditions. The Markov decision process model is proposed to optimize ambulance dispatch and reallocation, focusing on minimizing average response time, reducing delays from traffic congestion and patient severity misclassification, and enhancing general fairness. To tackle the challenges of the unbounded MDP state space, we employ a least-squares-based approximate policy iteration model for the upper bound and a linear programming model for the lower bound. We use Chinese emergency medical services system data for computational experiments and select average response time, fraction of delay, risk level, and Gini coefficient to evaluate the performance of our model. The numerical results demonstrate our optimal flexible policy outperforms the first-in-first-served rule, improving both efficiency and equity in medical decision-making.
Read full abstract