AbstractThe ocean is a major carbon sink and takes up 25%–30% of the anthropogenically emitted CO2. A state‐of‐the‐art method to quantify this sink are global ocean biogeochemistry models (GOBMs), but their simulated CO2 uptake differs between models and is systematically lower than estimates based on statistical methods using surface ocean pCO2 and interior ocean measurements. Here, we provide an in‐depth evaluation of ocean carbon sink estimates from 1980 to 2018 from a GOBM ensemble. As sources of inter‐model differences and ensemble‐mean biases our study identifies (a) the model setup, such as the length of the spin‐up, the starting date of the simulation, and carbon fluxes from rivers and into sediments, (b) the simulated ocean circulation, such as Atlantic Meridional Overturning Circulation and Southern Ocean mode and intermediate water formation, and (c) the simulated oceanic buffer capacity. Our analysis suggests that a late starting date and biases in the ocean circulation cause a too low anthropogenic CO2 uptake across the GOBM ensemble. Surface ocean biogeochemistry biases might also cause simulated anthropogenic fluxes to be too low, but the current setup prevents a robust assessment. For simulations of the ocean carbon sink, we recommend in the short‐term to (a) start simulations at a common date before the industrialization and the associated atmospheric CO2 increase, (b) conduct a sufficiently long spin‐up such that the GOBMs reach steady‐state, and (c) provide key metrics for circulation, biogeochemistry, and the land‐ocean interface. In the long‐term, we recommend improving the representation of these metrics in the GOBMs.