At present, most optimization objectives of heating, ventilation, and air-conditioning (HVAC) systems focus on the local optimization of equipment during cooling hours, usually ignoring the importance of the end conditions. In addition, traditional control methods may not perform well in complex or dynamic systems. Therefore, in this study, a hybrid model-free control (HMFC) strategy was developed that combines deep reinforcement learning (DRL) with L-BFGS-B and expert knowledge to improve its ability to cope with complex system environments. This strategy was used to optimize the cooling and heating periods of a groundwater source heat pump system in a bitterly cold region of China throughout the year, considering the end-of-system conditions. L-BFGS-B optimized the fresh-air ratios for the end air-handling unit, whereas DRL achieved global optimization of the heat pump outlet temperature, air-conditioning water circulation pump frequency, and submersible-pump frequency. To verify the effectiveness of the strategy, a simulation platform was built based on actual data and device parameters, and the simulation results of HMFC and model predictive control (MPC) were output, compared, and analyzed with data of the system measured in 2022 under expert rule-based manual control (RBC). The results show that HMFC was 7.38 % and 9.38 % more energy efficient than RBC during heating hours, respectively. HMFC was 24.26 % more energy efficient than RBC and 10.03 % more energy efficient than MPC during cooling hours. Moreover, the distribution of the system coefficient of performance under the HMFC method was more concentrated in the higher range, indicating that the proposed HMFC can save energy savings. Thus, it is a feasible optimization scheme for buildings without a large number of historical data. Finally, the strategy was applied to a real engineering experimental analysis, and the engineering practice results show that the strategy is robust and practical.