The current lack of a high-precision, real-time model applicable to the control optimization process of heat exchange systems, especially the difficulty in determining the overall heat transfer coefficient K of heat exchanger operating parameters in real time, is a prominent issue. This paper mainly unfolds the following work: 1. We propose a dynamic model for the control and optimization of the heat exchanger operation. By constructing a system to collect real-time operating data on the flow rates and temperatures on both sides of the heat exchanger, the parameter identification of the overall heat transfer coefficient K is performed. Subsequently, by combining this with mechanistic equations, a novel heat exchanger model is established based on the fusion of mechanistic principles and reinforcement learning. 2. We validate the new model, where the average relative error between the model’s temperature output values and the actual measured values is below 5%, indicating the high identification accuracy of the model. Moreover, under variations in the temperature and flow rate, the overall heat transfer coefficient K demonstrates the correct patterns of change. 3. To further enhance the model’s identification accuracy, a study on the reward functions in reinforcement learning is conducted. A model with the Logarithmic Mean Temperature Difference (LMTD) as the reward function exhibits a high identification accuracy. However, upon comparison, a model using the Arithmetic Mean Temperature Difference (AMTD) for relative error as the reward function shows an even higher identification accuracy. The model is validated under various operating conditions, such as changes in the flow rate on the hot side, demonstrating good scalability and applicability. This research contributes to providing a high-precision dynamic parameter basis for the precise control of heat exchange systems, offering significant guidance for the control optimization of actual heat exchange system operations.