Accurate prediction of indoor temperature is critical for climate change adaptation and occupant health. The aim of this study is to investigate an improved deep ensemble machine learning framework (DEML), by adjusting the model architecture with several machine learning (ML) and deep learning (DL) approaches to forecast the sensor-based indoor temperature in the Australian urban environment. We collected ambient station-based temperatures, satellite-based outdoor climate characteristics, and low-cost sensor-based indoor environmental metrics from 96 devices from August 2019 to November 2022, and established DEML with a rolling windows approach to assess the prediction stability over time. The DEML model was compared with several benchmark models, including Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGboost), Long-short term memory (LSTM), and Super Learner model (SL). A total of 13,715 days [median: 341 days; IQR (the interquartile range): 221–977 days] of low-cost sensor-based indoor temperature were included in 25 commercial and residential buildings across eight cities. The prediction performance of DEML was superior to the other five benchmark models in most of the sensors [coefficients of determination (R2) of 0.861–0.990 and root mean square error (RMSE) of 0.125–0.886 °C], followed by RF and SL algorithms. DEML consistently achieved high accuracy across different climate zones, seasons, and building types, which could be used as a crucial tool for optimizing energy use, maintaining occupant comfort and health, and adapting to the impacts of climate change.