To address the challenges of traffic congestion and suboptimal operational efficiency in the context of large-scale applications like production plants and warehouses that utilize multiple automatic guided vehicles (multi-AGVs), this article proposed using an Improved Q-learning (IQL) algorithm and Macroscopic Fundamental Diagram (MFD) for the purposes of load balancing and congestion discrimination on road networks. Traditional Q-learning converges slowly, which is why we have proposed the use of an updated Q value of the previous iteration step as the maximum Q value of the next state to reduce the number of Q value comparisons and improve the algorithm’s convergence speed. When calculating the cost of AGV operation, the traditional Q-learning algorithm only considers the evaluation function of a single distance and introduces an improved reward and punishment mechanism to combine the operating distance of AGV and the road network load, which finally equalizes the road network load. MFD is the basic property of road networks and is based on MFD, which is combined with the Markov Chain (MC) model. Road network traffic congestion state discrimination method was proposed to classify the congestion state according to the detected number of vehicles on the road network. The MC model accurately discriminated the range near the critical point. Finally, the scale of the road network and the load factor were changed for several simulations. The findings indicated that the improved algorithm showed a notable ability to achieve equilibrium in the load distribution of the road network. This led to a substantial enhancement in AGV operational efficiency.