ABSTRACT Due to the limitations inherent in satellite sensor capabilities, it remains a challenge to balance the temporal and spatial resolutions of current remote sensing observations. Spatiotemporal fusion has emerged as a viable approach to address this issue. Many spatiotemporal fusion models have been proposed to fuse surface reflectance initially; however, their adaption for land surface temperature (LST), particularly with the ‘fusion-then-retrieval (FTR)’ or ‘retrieval-then-fusion (RTF)’ strategy, has not yet been comprehensively explored. This paper describes a comparative study of the two strategies within the context of LST fusion using three classic fusion methods including the spatial and temporal adaptive reflectance fusion model (STARFM), the enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM) and the flexible spatiotemporal data fusion model (FSDAF). The findings illuminate nuanced applicability across different models: (1) the FTR strategy is more suitable for STARFM; (2) the strategic applicability for ESTARFM and FSDAF is strongly related to the reference images; (3) the STARFM and FSDAF models are easily affected by the time interval between the predicted date and the reference date. This work is expected to provide practical guidance for synthesizing high spatiotemporal LST data with fusion models.