Accurate prediction of PM2.5 concentrations in ports is crucial for authorities to combat ambient air pollution effectively and protect the health of port staff. However, in port clusters formed by multiple neighboring ports, we encountered several challenges owing to the impact of unique meteorological conditions, potential correlation between PM2.5 levels in neighboring ports, and coupling influence of background pollutants in city zones. Therefore, considering the spatiotemporal correlation among the factors influencing PM2.5 concentration variations within the harbor cluster, we developed a novel blending ensemble deep learning model. The proposed model combined the strengths of four deep learning architectures: graph convolutional networks (GCN), long short-term memory networks (LSTM), residual neural networks (ResNet), and convolutional neural networks (CNN). GCN, LSTM, and ResNet served as the base models aimed at capturing the spatial correlation of PM2.5 concentrations in neighboring ports, the potential long-term dependence of meteorological factors and PM2.5 concentrations, and the effects of urban ambient air pollutants, respectively. Following the blending ensemble technique, the prediction outcomes of three base models were used as the input data for the meta-model CNN, which employs the blending ensemble technique to produce the final prediction results. Based on actual data obtained from 18 ports in Nanjing, the proposed model was compared and analyzed for its prediction performance against six state-of-the-art models. The findings revealed that the proposed model provided more accurate predictions. It reduced mean absolute error (MAE) by 10.59 %–20.00 %, reduced root mean square error (RMSE) by 13.22 %–17.11 %, improved coefficient of determination (R2) by 10 %–35.38 %, and improved accuracy (ACC) by 3.48 %–7.08 %. Additionally, the contribution of each component to the prediction performance of the proposed model was measured using a systematic ablation study. The results demonstrated that the GCN model exerted the most substantial influence on the prediction performance of the GCN-LSTM-ResNet model, followed by the LSTM model. The influence of urban background pollutants can significantly enhance the generalizability of the complete model. Moreover, a comparison with three blended ensemble models incorporating any two base models demonstrated that the GCN-LSTM-ResNet model exhibited superior prediction performance and was particularly excellent in predicting the occurrence of high-concentration events. Specifically, the GCN-LSTM-ResNet model improved MAE and RMSE by at least 12.3% and 9.2%, respectively, but reduced R2 and ACC by 26.1% and 6.8%, respectively. The proposed model provided reliable PM2.5 concentration prediction outcomes and decision support for air quality management strategies in dry bulk port clusters.