This paper proposes a novel optimal control method for multi-zone HVAC systems to enhance energy efficiency and improve occupant comfort. To address the disturbances in outdoor weather and indoor loads, the proposed method formulates the HVAC control problem as a dynamic Markov decision process, and employs deep reinforcement learning (DRL) techniques to obtain the optimal control policy. To manage multiple goals among thermal comfort, indoor air quality (IAQ) and system energy efficiency, a preference-inspired (P-ins) mechanism is designed to achieve the optimal balance among different objectives. The P-ins mechanism effectively guides the agent towards the optimal control policy with high convergence rate. The proposed method has been validated through EnergyPlus-Python co-simulation testbed with real-world data traces, and assessed by an overall evaluation indicator. Results demonstrate that the proposed method can reduce energy consumption without compromising thermal comfort and IAQ. Specifically, the occurrence of temperature violations is reduced below 0.8 %, and a maximum energy saving of 9.41 % can be achieved, compared with traditional methods.