Abstract. The South China Sea (SCS) is the largest marginal sea of the North Pacific Ocean, where intensive field observations, including mappings of the sea surface partial pressure of CO2 (pCO2), have been conducted over the last 2 decades. It is one of the most studied marginal seas in terms of carbon cycling and could thus be a model system for marginal sea carbon research. However, the cruise-based sea surface pCO2 datasets are still temporally and spatially sparse. Using a machine-learning-based method facilitated by empirical orthogonal function (EOF) analysis, this study provides a reconstructed dataset of the monthly sea surface pCO2 in the SCS with a reasonably high spatial resolution (0.05∘ × 0.05∘) and temporal coverage between 2003 and 2020. The data input to our model includes remote-sensing-derived sea surface salinity, sea surface temperature, and chlorophyll, the spatial pattern of pCO2 constrained by EOF, atmospheric pCO2, and time labels (month). We validated our reconstruction with three independent testing datasets that are not involved in the model training. Among them, Test 1 includes 10 % of our in situ data, Test 2 contains four independent in situ datasets corresponding to the four seasons, and Test 3 is an in situ monthly dataset available from 2003–2019 at the South East Asia Time-series Study (SEATs) station located in the northern basin of the SCS. Our Test 1 validation demonstrated that the reconstructed pCO2 field successfully simulated the spatial and temporal patterns of sea surface pCO2 observations. The root mean square error (RMSE) between our reconstructed data and in situ data in Test 1 averaged ∼10 µatm, which is much smaller (by ∼50 %) than that between the remote-sensing-derived data and in situ data. Test 2 verified the accuracy of our retrieval algorithm in months lacking observations, showing a relatively small bias (RMSE of ∼8 µatm). Test 3 evaluated the accuracy of the reconstructed long-term trend, showing that, at the SEATs station, the difference between the reconstructed pCO2 and in situ data ranged from −10 to 4 µatm (−2.5 % to 1 %). In addition to the typical machine learning performance metrics, we assessed the uncertainty resulting from reconstruction bias and its feature sensitivity. These validations and uncertainty analyses strongly suggest that our reconstruction effectively captures the main spatial and temporal features of sea surface pCO2 distributions in the SCS. Using the reconstructed dataset, we show the long-term trends of sea surface pCO2 in five subregions of the SCS with differing physicobiogeochemical characteristics. We show that mesoscale processes such as the Pearl River plume and China coastal currents significantly impact sea surface pCO2 in the SCS during different seasons. While the SCS is overall a weak source of atmospheric CO2, the northern SCS acts as a sink, showing a trend of increasing strength over the past 2 decades. The data used in this article are available at https://doi.org/10.57760/sciencedb.02050 (Wang and Dai, 2022).