The 2-m temperature data is a significant indicator for studying the weather extremes and the exchange of water and energy fluxes between the surface and atmosphere. This study compared three reanalysis datasets, i.e., ERA5, ERA5-Land, and MERRA-2, with observations from in-situ sources from 1990 to 2022 using various statistical error metrics and extreme temperature indices over the Arabian Peninsula (AP) region. We selected these reanalysis datasets due to the continuous improvements and higher spatiotemporal output better capturing the temperature variability. The spatiotemporal climatology shows lower temperatures in winter (<15 °C) and maximum in summer (>35 °C); however, the reanalysis data show more deviations in temperature during the cold season than in the warm season. The reanalysis data underestimated the frequency of the cold (<10 °C) and hot (>30 °C) days across the four regions, except ERA5-Land, which closely followed observed data. The strength of the correlation shows better performance (>0.90) in the cold extremes (cold nights and cold days) frequency than the hot extremes. On an interannual scale, reanalysis products exhibit strong correlations (>0.90) with in-situ data across most regions, particularly in winter and autumn, moderate in spring, and weaker in summer. The reanalysis data shows negative biases in the inland regions and positive biases in the coastal areas with consistent root mean square differences (RMSD) spatiotemporally. The differences in performance are due to the topography and poor representation of the energy fluxes, especially in MERRA-2 as well as missing data in observations. This study recommends ERA5-Land as the first choice for extreme weather simulations in the region, followed by MERRA-2 and ERA5 on the same scale, but proper attention is needed when using reanalysis data for cold and hot extremes.