Climate change and human activities have significantly impacted the long-term growth of vegetation, thereby altering the ecosystem’s response mechanisms. The Yellow River Water Conservation Area (YRWCA) is a critical ecological functional zone in China. Since 1982, the vegetation in the YRWCA has changed significantly, and the primary drivers of vegetation which changed before and after 2000 were identified as climate change and human activities, respectively. However, the extent to which different drivers contribute to the vegetation dynamics of the YRWCA remains uncertain. In this study, we introduced a modified deep Convolutional Long Short-Term Memory (ConvLSTM) model to quantify the contributions of climate change and human activities to vegetation change while considering the spatiotemporal heterogeneity. We identified areas with minimal human activity before 2000 using the residual trend method, and used the regional data from these areas to train the model. Subsequently, we applied the trained deep ConvLSTM model to perform an attribution analysis after 2000. The results show that the deep ConvLSTM effectively captures the impacts of climate change on vegetation growth and outperforms the widely used Random Forest model (RF). Despite the fact that the input data of RF were optimized, ConvLSTM still distinctly outperformed RF, achieving R2, MAE, and RMSE values of 0.99, 0.013, and 0.018, respectively, compared to RF’s corresponding values of 0.94, 0.038, and 0.045. Since 2000, the regional normalized difference vegetation index (NDVI) has shown a broad increasing trend, particularly in dryland, primarily induced by human activities from 2006 to 2015. Furthermore, an analysis of changes in regional land use, particularly in drylands, revealed that the highest magnitude of conversion of farmland back to forest or grass was recorded from 2000 to 2005. However, the most significant contributions from human activities occurred from 2006 to 2015, indicating a time lag in vegetation recovery from these ecological programs. The attribution results provide valuable insights for the implementation of ecological programs, and the introduced deep ConvLSTM proves the suitability of deep learning models that capture spatiotemporal features in vegetation growth simulations, allowing for broader applications.