Mangrove forests, a critical component of blue carbon ecosystems, make a significant contribution to mitigating climate change and maintaining biodiversity. But in China, they have been suffering from severe invasion by Spartina alterniflora (S. alterniflora). Accurate information on spatial distribution of mangroves and S. alterniflora is a prerequisite to protecting indigenous mangrove ecosystems and managing S. alterniflora invasion. However, the classification performance is often constrained by two major issues: (1) A single-date image, regardless of acquisition time, hardly provides optimal spectral separability among evergreen mangroves, deciduous S. alterniflora and mudflats; (2) Due to the gradual invasion process, the mixtures between land cover classes are prevalent in moderate-resolution images. The use of high-resolution (finer than 2 m) image time series could effectively solve these two issues. However, due to frequent cloud cover in coastal areas and long revisit time of high-resolution satellite sensors, the collection of high-resolution image time series is still a great challenge, let alone substantial expenses. In this study, we proposed a new multi-temporal classification approach to fine-scale mapping of exotic S. alterniflora and native mangroves using synthetic high-resolution image time series derived from spatiotemporal fusion of WorldView-2 and Sentinel-2 data. A total of six synthetic WorldView-2-like images in different phenological periods were produced by the Flexible Spatiotemporal DAta Fusion (FSDAF) method. With the combination of the original and synthetic images, both types of image-based multi-temporal features and pixel-based image compositing features were extracted and evaluated in the following Random Forest classifications. Monte Carlo cross-validation was adopted to provide robust accuracy estimates, and a spatially explicit confidence map was generated to show classification uncertainty and potential errors. In order to further demonstrate the necessity of generating synthetic WorldView-2 images, the entire classification process was repeated using Sentinel-2 images at 10-m spatial resolution and the comparison between fine- and moderate-resolution classification results was made in terms of classification accuracies, classification maps and fractional abundances. Our results show that the spectral profiles of S. alterniflora and mangroves generated from the synthetic WorldView-2 images were consistent with the ones from the paired Sentinel-2 images. Meanwhile, the spatial details of high landscape heterogeneity, unrecognizable in the Sentinel-2 images, were revealed by the synthetic WorldView-2 images. Compared to the single-date classification with the original WorldView-2 image, most of multi-temporal classifications using both the original and synthetic images provided more accurate results, with the overall accuracy raised by up to 3.9%. Among various temporal features adopted in this study, the use of the image-based, stacked spectral reflectance and vegetation index features of all original and synthetic images produced the best classification performance, increasing the producer's and user's accuracies of mangroves, S. alterniflora and mudflats by 2.5%–7.1%. Finally, the Sentinel-2 classification accuracies were generally lower than the WorldView-2 counterparts across all temporal features, with a 6% difference in the highest overall accuracy between the two. Prediction errors of the moderate-resolution classification were often found in heterogeneous landscapes, where S. alterniflora was mixed with mangroves or mudflats. Considering the availability of moderate-resolution data with high revisit frequencies, such as Sentinel-2A/B and Landsat 8/9, future fine-scale mapping exercises might benefit from the multi-temporal classification approach with spatiotemporal fusion of moderate- and high-resolution data, alleviating the problem of the scarcity of high-resolution data in cloudy areas.