AbstractThis paper reconciles the state‐of‐the‐art observations and simulations of evapotranspiration (ET) temporal variability through a diagnostic framework composed of an observation‐model‐theory triplet. Specifically, a confirmed theoretical tool, Evapotranspiration Temporal VARiance Decomposition (EVARD), is used as a benchmark to estimate ET monthly variance ( ) across the contiguous United States (CONUS) with inputs including hydroclimatic observations, Gravity Recovery and Climate Experiment‐based terrestrial water storage, four observation‐based products (ETRSUW by the University of Washington, ETRSMOD16 from MOD16 Global Terrestrial ET Data Set, ETFLUXNET upscaled from of fluxtower observations, and ETGLEAM from Global Land Evaporation Amsterdam Model), and four operational land surface models (LSMs: MOSAIC, NOAH, NOAH‐MP, and VIC). Five experiments are systematically designed to evaluate and diagnose possible errors and uncertainties in ET temporal variance estimated by the four observation‐based ET products and the four LSM simulations. Based on the results of these experiments, the following diagnostic hypotheses regarding the uncertainty of the observation‐based ET products are illustrated: ETRSUW captures the high signals in the Midwest with negligible bias and moderate uncertainty over the contiguous United States; ETFLUXNET systematically underestimates over CONUS but with the lowest level of uncertainty; ETRSMOD16 has medium bias with the highest level of uncertainty, and the spatial distribution of high signal from ETRSMOD16 is different from other estimates; ETGLEAM has slight negative bias and medium uncertainty, and in the West Coast is smaller than that from ETVARD. Regarding the LSMs, it is found that any of the four LSMs can be the best depending on a certain set of reference observations. The study reveals that LSMs have shown a reasonably worthy, though not perfect, capability in estimating ET and its variability in regions/aquifers with limited human interference. However, RS‐based observations and theoretical estimates suggest that all the four LSMs examined in this study are not able to accurately predict the ET variability in regions/aquifers heavily influenced by human activities like Central Valley and High Plains aquifers; they all underestimate ET variability along the West Coast due to seasonal vegetation responses to Mediterranean climate and human water use. In addition, LSMs underestimate intraannual ET variance in California and the High Plains with underestimated terrestrial storage change components in ET variance, due to the inappropriate representation of groundwater pumping and its impact on ET and other hydrologic processes. This paper urges advancing hydrologic knowledge by finding congruence among models, data, and theories.