This study presents a novel deep-learning framework for predicting the thermal appearance of building envelopes under varying weather conditions based on a new dataset collected using a thermal infrared camera at 10 min intervals over a one-and-a-half-year period. Unlike existing studies that rely on simulated data or physical models that do not always accurately reflect the complex heat transfer processes in real buildings, we have collected a large dataset showing how a building behaves under different climatic conditions. We propose a novel deep-learning approach that integrates weather data and thermal imagery to predict the temperature distribution on the building façade for the next 24 and 48 h. The model uses a state-of-the-art recurrent neural network architecture, PredRNN V2, with an action conditioning mechanism to incorporate weather forecasting data into the prediction process. We evaluate this approach in terms of average accuracy, prediction accuracy in specific regions, and visual-perceptual performance of the images. The proposed framework achieves a prediction accuracy of 1.5 °C (root mean square error—RMSE) for the 24 h prediction and 2.04 °C (RMSE) for the 48 h prediction, outperforming baseline models in terms of temperature prediction accuracy and structural similarity of the predicted images.