Accurate long-term (6–24 h) prediction of PM2.5 is critical to human health and daily life. While deep learning techniques have been extensively used to forecast PM2.5, prior studies have primarily relied on shallow recurrent neural networks (RNNs), which may accumulate errors and limit the long-term prediction capability of the model. To address this issue, a new hybrid model has been proposed in this study, which combines the Complete Ensemble Empirical Mode Decomposition Adaptive Noise (CEEMDAN) method with a deep Transformer neural network (DeepTransformer) to enhance the accuracy of long-term PM2.5 forecasting. The model includes a new embedding layer that efficiently models historical, meteorological, and discrete-time data. Additionally, to improve the long-term inference capability of DeepTransformer, a non-autoregressive direct multi-step (DMS) prediction strategy is introduced, and a novel DMS decoder replaces the vanilla Transformer decoder. Experiments conducted on two public datasets demonstrate that the novel model achieves excellent prediction performance. Specifically, DeepTransformer achieves R2 = 0.984 and RMSE = 11.61 µg/m3 in 1-hour prediction and R2=0.704 and RMSE = 30.78 µg/m3 in 24-hour prediction. Compared to single models, DeepTransformer achieves a 30% decrease in MAE, a 27% decrease in RMSE, and a 59% increase in R2 for the long-term (24-hour) prediction of PM2.5