Land use is a crucial factor affecting ecosystem service value (ESV), and forecasting future land use changes and ESV response can guide urban planning and sustainable development decisions. However, the traditional Cellular Automata (CA) model supposes that each cell has only one land use type at each time step, neglects the mixed structure and proportional distribution of land use units, does not take into account its quantitative continuous dynamic change, and lacks the exploration of land use quantity structure and spatial pattern optimization. This study employed a novel mixed-cell cellular automata (MCCA) approach, coupled with the system dynamics (SD) model to predict the spatiotemporal pattern of land use under the natural increase scenario (NIS), economic development scenario (EDS) and ecological protection scenario (EPS) in Xi’an, China, in 2030. The equivalent coefficient method was utilized to investigate the heterogeneity distribution and sensitivity of ESV. The results demonstrated that SD-MCCA exhibited remarkable prediction accuracy and robustness. The main changes in land use in 2000–2015 were due to urban expansion, the conversion of arable land into construction land, and the conversion between grassland and arable land. The total ESV increased from 19554.36×106 CNY in 2000 to 19618.39×106 CNY under the EPS in 2030, and the contribution of climate regulation and hydrological regulation to ESV was the highest. Spatial heterogeneity of ESV revealed a certain regularity, and the high value region was chiefly concentrated in woodland and grassland with favorable ecological conditions. Land use variations under NIS and EPS improved ESV, while the ESV had a negative response to land use transformations under the EDS. This research provides a new way to identify the relationship between future land utilization scenarios and ESV, which is of great significance for the management of land resources and formulation of ecological compensation standards.