The importance of phosphorus (P) in regulating ecosystem responses to climate change has fostered P-cycle implementation in land surface models, but their CO2 effects predictions have not been evaluated against measurements. Here, we perform a data-driven model evaluation where simulations of eight widely used P-enabled models were confronted with observations from a long-term free-air CO2 enrichment experiment in a mature, P-limited Eucalyptus forest. We show that most models predicted the correct sign and magnitude of the CO2 effect on ecosystem carbon (C) sequestration, but they generally overestimated the effects on plant C uptake and growth. We identify leaf-to-canopy scaling of photosynthesis, plant tissue stoichiometry, plant belowground C allocation, and the subsequent consequences for plant-microbial interaction as key areas in which models of ecosystem C-P interaction can be improved. Together, this data-model intercomparison reveals data-driven insights into the performance and functionality of P-enabled models and adds to the existing evidence that the global CO2-driven carbon sink is overestimated by models.