The precise forecasting of air quality is of great significance as an integral component of early warning systems. This remains a formidable challenge owing to the limited information of emission source and the considerable uncertainties inherent in dynamic processes. To improve the accuracy of air quality forecasting, this work proposes a new spatiotemporal hybrid deep learning model based on variational mode decomposition (VMD), graph attention networks (GAT) and bi-directional long short-term memory (BiLSTM), referred to as VMD–GAT–BiLSTM, for air quality forecasting. The proposed model initially employ a VMD to decompose original PM2.5 data into a series of relatively stable sub-sequences, thus reducing the influence of unknown factors on model prediction capabilities. For each sub-sequence, a GAT is then designed to explore deep spatial relationships among different monitoring stations. Next, a BiLSTM is utilized to learn the temporal features of each decomposed sub-sequence. Finally, the forecasting results of each decomposed sub-sequence are aggregated and summed as the final air quality prediction results. Experiment results on the collected Beijing air quality dataset show that the proposed model presents superior performance to other used methods on both short-term and long-term air quality forecasting tasks.