Data-driven models serve as valuable tools for understanding and tackling the UHI phenomenon that can provide user-friendly platforms for urban planners for incorporating UHI considerations in their decisions. This study aims to assess the generalizability of data-driven UHI models at the street-level resolution, particularly considering various similarity degrees of urban contexts between training and testing cities. Five cities from three countries were selected to encompass a diverse range of similarities in this comparative study. Five Random Forest models were developed. The lowest-performing model has an R2 value of 0.56 and an MAE of 0.07, and the highest-performing model has an R2 of 0.71 and an MAE of 0.05. While these models proved to be accurate for the cities they were trained for, cross-validation of the models in different cities revealed low generalizability capabilities, irrespective of the similarity degree between training and testing datasets. Small changes in feature importance resulted in significant variation in UHI derivation mechanisms and behavior, which contributes to the models’ low generalizability. The findings of this research indicate that universal mitigation strategies may not yield consistent outcomes worldwide, and a one-size-fits-all approach may be inefficient in addressing UHI. Hence, it's vital to tackle UHI locally.