With the rapid development of information technology, the demand for digital agriculture is increasing. As an important agricultural production topic, crop yield has always attracted much attention. Currently, artificial intelligence, particularly machine learning, has become the leading approach for crop yield prediction. As a result, developing a machine learning method that accurately predicts crop yield has become one of the central challenges in digital agriculture. Unlike traditional regression prediction problems, crop yield prediction has a significant time correlation. For example, weather data for each county show strong temporal correlations. Moreover, geographic information from different regions also impacts crop yield to a certain extent. For example, if a county’s neighboring counties have a good harvest, then this county is likely to have high yields as well. This paper introduces a novel hybrid deep learning framework that combines convolutional neural network (CNN), graph attention network (GAT) and long short-term memory (LSTM) modules to enhance prediction accuracy. Specifically, CNN is employed to extract the features from the input data for each county in each year. GAT is introduced to model the geographical relationships between neighboring counties, allowing the model to capture spatial dependencies more effectively. LSTM is used to extract the temporal information within many years. The proposed hybrid deep learning framework CNN-GAT-LSTM captures both the temporal and spatial relationships, thereby improving the accuracy of yield prediction. We conduct experiments on a nationwide dataset that includes data from 1115 soybean-producing counties in 13 states in the United States covering the years from 1980 to 2018. We evaluate the performance of our proposed CNN-GAT-LSTM model based on three metrics, namely root of the mean squared error (RMSE), R-squared (R2) and correlation coefficient (Corr). The experimental results demonstrate that the proposed model achieves significant performance improvements over the existing state-of-the-art model, with RMSE reduced by 5%, R2 improved by 6% and Corr enhanced by 4%.