Atmospheric chemistry transport models have been extensively applied in aerosol forecasts over recent decades, whereas they are facing challenges from uncertainties in emission rates, meteorological data, and over-simplified chemical parameterizations. Here, we developed a spatial-temporal deep learning framework, named PPN (Pollution-Predicting Net for PM2.5), to accurately and efficiently predict regional PM2.5 concentrations. It has an encoder-decoder architecture and combines the preceding PM2.5 observations and numerical weather prediction. Besides, the model proposes a weighted loss function to promote the forecasting performance in extreme events. We applied the proposed model to forecast 3-day PM2.5 concentrations over the Beijing-Tianjin-Hebei region in China on a three-hour-by-three-hour basis. Overall, the model showed good performance with R2 and RMSE values of 0.7 and 17.7 μg m−3, respectively. It could capture the high PM2.5 concentration in the south and relatively low concentration in the north and exhibit better performance within the next 24 h. The use of the weighted loss function decreased the level of “high values underestimation, low values overestimation”, while incorporating the preceding PM2.5 observations into the encoder phase improved the predictive accuracy within 24 h. We also compared the model result with that from a state-of-the-art numerical model (WRF-Chem with pollutant data assimilation). The temporal R2 and RMSE from the WRF-Chem were 0.30−0.77 and 19−45 μg m−3 while those from the PPN model were 0.42−0.84 and 15−42 μg m−3. The proposed model shows powerful capacity in aerosol forecasts and provides an efficient and accurate tool for early warning and management of regional pollution events.