With the widespread application of solar-energy-air-source heat pump composite heating, optimizing the regulation of air temperature in heating zones, enhancing system thermal comfort, and reducing energy consumption have become crucial. To ensure residential comfort while minimizing HVAC system energy costs, a control method combining guided policy search (GPS) with model predictive control (MPC) is proposed for composite heating systems. This method replaces state estimation in MPC with policy search and substitutes the offline trajectory optimization used in guided policy search with MPC, thus integrating the strengths of both approaches. This strategy not only improves control precision but also reduces energy consumption and enhances system robustness. The GPS-MPC control algorithm was validated against reinforcement learning and MPC. A simulation model was developed and validated on a real physical platform. Simulations compared the control effects of on-off control and MPC on pump frequency, flow rate, stratified thermal storage tank temperature, component, and system energy consumption. The data results indicate that the GPS-MPC algorithm offers superior predictive accuracy, efficiency, and robustness in composite heating control systems compared to conventional methods. Under the GPS-MPC strategy, indoor temperature fluctuations, pump frequency response speed, and energy consumption were significantly improved, with temperature fluctuation limited to only 0.8 °C, and the system achieving energy savings of over 12 %.