The tradeoffs between the spatial and temporal resolutions for the remote sensing instruments limit their capacity to monitor the eutrophic status of inland lakes. Spatiotemporal fusion (STF) provides a cost-effective way to obtain remote sensing data with both high spatial and temporal resolutions by blending multisensor observations. However, remote sensing reflectance ( R rs ) over water surface with a relatively low signal-to-noise ratio is prone to be contaminated by large uncertainties in the fusion process. To present a comprehensive analysis on the influence of processing and modeling errors, we conducted an evaluation study to understand the potential, uncertainties, and limitations of using STF for monitoring chlorophyll a (Chla) concentration in an inland eutrophic water (Chaohu Lake, China). Specifically, comparative tests were conducted on the Sentinel-2 and Sentinel-3 image pairs. Three typical STF methods were selected for comparison, i.e., Fit-FC, spatial and temporal nonlocal filter-based fusion model, and the flexible spatiotemporal data fusion. The results show as follows: (a) among the influencing factors, atmospheric correction uncertainties and geometric misregistration have larger impacts on the fusion results, compared with radiometric bias between the imaging sensors and STF modeling errors; and (b) the machine-learning-based Chla inversion accuracy of the fusion data [ R 2 = 0.846 and root mean square error (RMSE) = 17.835 μg/l] is comparable with that of real Sentinel-2 data ( R 2 = 0.856 and RMSE = 16.601 μg/l), and temporally dense Chla results can be produced with the integrated Sentinel-2 and fusion image datasets. These findings will help to provide guidelines to design STF framework for monitoring aquatic environment of inland waters with remote sensing data.