Efficient and reliable data routing is critical in Advanced Metering Infrastructure (AMI) within Smart Grids, dictating the overall network performance and resilience. This paper introduces Q-RPL, a novel Q-learning-based Routing Protocol designed to enhance routing decisions in AMI deployments based on wireless mesh technologies. Q-RPL leverages the principles of Reinforcement Learning (RL) to dynamically select optimal next-hop forwarding candidates, adapting to changing network conditions. The protocol operates on top of the standard IPv6 Routing Protocol for Low-Power and Lossy Networks (RPL), integrating it with intelligent decision-making capabilities. Through extensive simulations carried out in real map scenarios, Q-RPL demonstrates a significant improvement in key performance metrics such as packet delivery ratio, end-to-end delay, and compliant factor compared to the standard RPL implementation and other benchmark algorithms found in the literature. The adaptability and robustness of Q-RPL mark a significant advancement in the evolution of routing protocols for Smart Grid AMI, promising enhanced efficiency and reliability for future intelligent energy systems. The findings of this study also underscore the potential of Reinforcement Learning to improve networking protocols.