Beamforming is an essential technology in 5G Massive Multiple-Input Multiple-Output (MMIMO) communications, which are subject to many impairments due to the nature of wireless transmission channel. The Inter-Cell Interference (ICI) is one of the main obstacles faced by 5G communications due to frequency-reuse technologies. However, finding the optimal beamforming parameter to minimize the ICI requires infeasible prior network or channel information. In this paper, we propose a dynamic Q-learning beamforming method for ICI mitigation in the 5G downlink that does not require prior network or channel knowledge. Compared with a traditional beamforming method and other industrial Reinforcement Learning (RL) methods, the proposed method has lower computational complexity and better convergence efficiency. Performance analysis shows the quality of service improvement in terms of Signal-to-Interference-plus-Noise-Ratio (SINR) and the robustness towards different environments.