Land use (LU) changes caused by urbanization, climate, and anthropogenic activities alter the supply of ecosystem services (ES), which affects the ecological service value (ESV) of a given region. Existing LU simulation models extract neighborhood effects with only one data time slice, which ignores long-term dependence in neighborhood interactions. Previous studies on the dynamic relationship between LU change and ES in semi-arid areas is rare than that in humid coastal areas. Here, we selected a semi-arid city, Lanzhou, in Northwest China as the study area, to simulate LU changes in 2030 under natural growth (NG), ecological protection (EP), economic development (EP), and ecological protection-economic development (EPD) scenarios, using a novel deep learning method, named CL-CA. Convolutional neural network and long short term memory (CNN-LSTM) with cellular automata (CA) were utilized to extract the spatiotemporal neighborhood features. The overall simulation performance of the proposed model was larger than 0.92, which is surpassed that of LSTM-CA, artificial neural network (ANN)-CA, and recursive neural network (RNN)-CA. Ultimately, we utilized LU and ES to quantitatively evaluate the ESV changes. The results indicated that: (1) The variable trend of ESV in arid area is different from that in coastal humid areas. (2) Forest land and water were the main factors that affect the ESV change. (3) The EPD scenario was more suitable for sustainable urban development.