Urban transportation systems in tourism-centric cities face challenges from rapid urbanization and population growth. Efficient, resilient, and sustainable bus route optimization is essential to ensure reliable service, minimize environmental impact, and maintain safety standards. This study presents a novel Hybrid Reinforcement Learning-Variable Neighborhood Strategy Adaptive Search (H-RL-VaNSAS) algorithm for multi-objective urban bus route optimization. Our mathematical model maximizes resilience, sustainability, tourist satisfaction, and accessibility while minimizing total travel distance. H-RL-VaNSAS is evaluated against leading optimization methods, including the Crested Porcupine Optimizer (CPO), Krill Herd Algorithm (KHA), and Salp Swarm Algorithm (SSA). Using metrics such as Hypervolume and the Average Ratio of Pareto Optimal Solutions, H-RL-VaNSAS demonstrates superior performance. Specifically, H-RL-VaNSAS achieved the highest resilience index (550), sustainability index (370), safety score (480), tourist preferences score (300), and accessibility score (2300), while minimizing total travel distance to 950 km. Compared to other methods, H-RL-VaNSAS improved resilience by 12.24–17.02%, sustainability by 5.71–12.12%, safety by 4.35–9.09%, tourist preferences by 7.14–13.21%, accessibility by 4.55–9.52%, and reduced travel distance by 9.52–17.39%. This research offers a framework for designing efficient, resilient, and sustainable public transit systems that align with urban planning and transportation goals. The integration of reinforcement learning with VaNSAS significantly enhances optimization capabilities, providing a valuable tool for mathematical and urban transportation research communities.