For the congestion problems in high-speed networks, a Metropolis criterion based Q-learning flow controller is proposed. Because of the uncertainties and highly time-varying, it is not easy to accurately obtain the complete information for high-speed networks. The Q-learning algorithm, which is independent of mathematic model, shows the particular superiority in high-speed networks. It obtains the optimal Q-values through interaction with the environment to improve its behavior policy. The Metropolis criterion of simulated annealing algorithm can cope with the balance between exploration and exploitation in Q-learning. By means of learning procedures, the proposed controller can learn to take the best action to regulate source flow with the features of high throughput and low packet loss ratio. Simulation results show that the proposed method can promote the performance of the networks and avoid the occurrence of congestion effectively.