Efficient real-time power optimization of direct methanol fuel cell (DMFC) system is crucial for enhancing its performance and reliability. The power of DMFC system is mainly affected by stack temperature and circulating methanol concentration. However, the methanol concentration cannot be directly measured using reliable sensors, which poses a challenge for the real-time power optimization. To address this issue, this paper investigates the operating mechanism of DMFC system and establishes a system power model. Based on the established model, reinforcement learning using Q-learning algorithm is proposed to control methanol supply to optimize DMFC system power under varying operating conditions. This algorithm is simple, easy to implement, and does not rely on methanol concentration measurements. To validate the effectiveness of the proposed algorithm, simulation comparisons between the proposed method and the traditional perturbation and observation (P&O) algorithm are implemented under different operating conditions. The results show that proposed power optimization based on Q-learning algorithm improves net power by 1% and eliminates the fluctuation of methanol supply caused by P&O. For practical implementation considerations and real-time requirements of the algorithm, hardware-in-the-loop (HIL) experiments are conducted. The experiment results demonstrate that the proposed methods optimize net power under different operating conditions. Additionally, in terms of model accuracy, the experimental results are well matched with the simulation. Moreover, under varying load condition, compared with P&O, proposed power optimization based on Q-learning algorithm reduces root mean square error (RMSE) from 7.271% to 2.996% and mean absolute error (MAE) from 5.036% to 0.331%.