As the impact of climate change on cities intensifies, the analysis and modeling of the urban microclimate are becoming increasingly important. A key parameter to this end is the estimation of urban surface temperatures. Traditional approaches for such estimates, that use urban micrometeorology models, are based on simulations that require heavy computations and complex inputs — including urban geometry, radiative parameters, and meteorological conditions. As a result, they cannot be applied extensively in all cities, where some of the input parameters might be missing and computational power might not be available. In this paper, we propose an alternative approach founded upon a deep learning framework, requiring markedly simplified inputs: street view imagery and meteorological conditions. We use our model to estimate building facade surface temperatures in the city of London (Ontario, Canada) and compare results both with simulated values obtained from an established simulation software TUF-3D and ground truth data collected through an onsite campaign with a FLIR thermal imager. Results substantiate the superiority of our proposed approach over TUF-3D, concurrently emphasizing its advantages in terms of input simplicity and computational resource efficiency. As street view imagery is becoming ubiquitous across the world – for instance, through platforms such as Google Street View – our approach lays the foundation for a new paradigm of fast, cost-effective, and highly scalable models, empowering urban designers and local authorities to better understand a changing urban climate.