Air is an essential natural resource, and the Air Quality Index (AQI) is an important indicator visually reflecting air quality. Accurate AQI prediction is critical for controlling air pollution. This study proposes a new spatio-temporal correlation hybrid prediction model ST-EXMG-AE-XGBoost. The process of building this model includes three stages. In the first stage, Pearson correlation analysis is performed on different AQI monitoring stations to capture their temporal correlation characteristics. The prediction models T-ELM, T-XGBoost, and T-MLP are built in turn. In the second stage, GAT captures the spatial correlation characteristics, and the prediction model S-GAT is made. In the third stage, the prediction results of T-ELM, T-XGBoost, T-MLP, and S-GAT are learned, and features are reconstructed using the AE autoencoder. In contrast, the spatio-temporal association prediction is performed using the XGBoost method afterward. Through the prediction study of historical data from AQI monitoring stations in Shunyi District, Beijing, the following conclusions can be drawn: (a) compared with deep learning models such as LSTM and TCN, the classical models such as ELM, MLP, and XGBoost still have good application value with high prediction accuracy and short training time; (b) compared with single prediction models, the proposed temporal correlation prediction methods T-ELM, T-MLP, and T-XGBoost further improve the AQI prediction effect; (c) the use of GAT method can effectively explore the spatial connectivity characteristics between different AQI monitoring stations; (d) the proposed spatio-temporal correlation hybrid prediction model ST-EXMG-AE-XGBoost performs the best among all the models involved and has good application prospects.