Building energy simulation (BES) tools are fundamental for predicting energy performance and comfort. However, detailed models are computationally complex and demand high simulation times. These lead to difficulties in parametric runs and numerical optimizations. Performing numerous retrofit scenarios is hardly feasible, especially in multi-zone buildings with complex geometries.This paper introduces a novel approach to model order reduction (MOR) of BES models. The approach utilizes a deep learning-based unsupervised convolutional neural network autoencoder (CNN-AE). The method decomposes complex time series data derived from detailed simulations into lower dimension features. The low-dimension representations can be grouped through clustering algorithms to build a reduced-order model (ROM). The approach in this study automatically finds archetype zones of the original model that represent the energy behavior of a group of rooms, and removes redundant ones. The energy demand of the whole building can be estimated through these archetype zones.Our investigation shows that CNN-AE can be efficiently applied to reduce complex building energy simulation models. As proof of concept, a detailed model of a multi-zone campus building with 889 thermal zones is compared to the ROM derived from the CNN-AE. Comprehensive autoencoder hyperparameter training to optimize the accuracy of the model is provided. The ROM supports different purposes, such as energy scenario developments, with a total error of less than 1% compared to the original model, and reduced simulation times by a factor of more than 16.