With the application of renewable energy in building energy systems (BES), an increasing number of power grids require building energy systems coupled to realize off-grid operation which is one type of energy flexible and grid responsive operations. In this case, deep reinforcement learning (DRL) algorithms have gained more and more attention in the operation control of BES due to their strong fitting ability and model-free utilization characteristics. However, mainstream DRL algorithms cannot solve the reinforcement learning problem of hybrid action spaces, which also restricts its further application in BES including variety of energy sources. In this paper, we firstly use a multi-agent deep reinforcement learning algorithm (MADRL) to solve the RL problem of hybrid action spaces in the building controls domain. The proposed algorithm is validated on a measured dataset of a real office building in Japan. The results show that compared to the currently used baseline control logic, MADRL can achieve a 60% improvement in off-grid operation tasks. For battery safety, MADRL can reduce unsafe battery runtime by at least 80%. Furthermore, through experiments we find that the single-agent DRL algorithm cannot solve the reinforcement learning problem with hybrid action spaces. The MADRL framework achieves stable training and optimization by layering problems and agents.