Background: The dynamic vehicle routing problem (DVRP) is a complex optimization problem that is crucial for applications such as last-mile delivery. Our goal is to develop an application that can make real-time decisions to maximize total performance while adapting to the dynamic nature of incoming orders. We formulate the DVRP as a vehicle routing problem where new customer requests arrive dynamically, requiring immediate acceptance or rejection decisions. Methods: This study leverages reinforcement learning (RL), a machine learning paradigm that operates via feedback-driven decisions, to tackle the DVRP. We present a detailed RL formulation and systematically investigate the impacts of various state-space components on algorithm performance. Our approach involves incrementally modifying the state space, including analyzing the impacts of individual components, applying data transformation methods, and incorporating derived features. Results: Our findings demonstrate that a carefully designed state space in the formulation of the DVRP significantly improves RL performance. Notably, incorporating derived features and selectively applying feature transformation enhanced the model’s decision-making capabilities. The combination of all enhancements led to a statistically significant improvement in the results compared with the basic state formulation. Conclusions: This research provides insights into RL modeling for DVRPs, highlighting the importance of state-space design. The proposed approach offers a flexible framework that is applicable to various variants of the DVRP, with potential for validation using real-world data.