AbstractA wireless mesh network uses an ad‐hoc routing module for communication between nodes by referring to the addresses assigned to the nodes according to the rules kept in the routing table. If links fail, then the necessary network routing information must be propagated via the relevant remaining reachable nodes. The weaknesses of this conventional wireless mesh network are its difficulty in terms of management and configuration adjustments due to the complex structure of the network. To cope with such difficulty, we propose the use of a machine learning technique to implement dynamic routing to deal with failures in the wireless mesh network. The aim is to improve network routing via a reinforcement‐learning framework. Specifically, the Q‐learning algorithm has been applied. The investigation results are reported for network scenarios under conditions of link failures, and the algorithm's parameter sensitivity analysis is given. In addition, the proposed algorithm is compared with the Dijkstra algorithm with either no or immediate link cost updates. The resultant end‐to‐end route latency is reported from the simulated conditions as parameterized by the real per‐hop latency measurement from the actual outdoor testbed implementation. © 2021 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.