Energy consumption forecasting is essential for energy system integration and management. However, existing studies mainly focus on temporal features of energy consumption, which neglects the spatial correlation of variables with time information. Capturing the spatio-temporal relationships helps to improve forecasting accuracy and further promote energy dispatch. To tackle this problem, an explainable Convolutional Neural Network-Long Short Term Memory forecasting model is employed to effectively predict the total energy consumption by capturing the spatial and temporal features of multivariate time series. In the model, the autoencoder is used to achieve the nonlinear dimensionality reduction and transfer the data to a low-dimensional space. Furthermore, a Convolutional Neural Network is used to extract more effective features from the decoded data, and long short-term memory is employed to identify the temporal dependencies between extracted features and total energy consumption. Shapley additive explanation is introduced to interpret the outputs of the black-box model. The superior performance of the proposed method with high accuracy and good adaptability is verified by the comparisons with conventional forecasting models. This method provides an insight into the regional energy consumption analyzing contributions of weather variables to energy consumption, which helps administers in understanding regional energy performance for enhancing energy efficiency.