Pollution is a major concern in the present day, causing multiple illnesses and deaths, specifically in developing countries in Asia and Africa. While it has drawn worldwide attention as governments try to issue laws to meet certain criteria for air pollution levels, pollution concentration forecasting has become a major challenge. Particularly, short term forecasting will help to gain information regarding concentrations of harmful pollutants for the upcoming hours and enable better decision-making with regards to controlling air pollution. In this paper, we investigate spatio-temporal graph-based models to determine the best methods for spatial and temporal analysis of data. The models have the additional capacity to perform multi-variate predictions of correlated data, i.e., predicting multiple pollutant concentrations simultaneously, thus requiring lower computational efforts. A real-world pollution dataset measured over Delhi, India, is used to comparing the proposed models with baselines, which shows the Spatio-Temporal Graph Convolution Neural Network (STGCN) models to be performing better than others. For a better understanding of model architectures with the most effective strategies for spatial and temporal data analysis, three models, namely STGCN-A, STGCN-B, STGCN-C have been developed. The models have been compared with 6 other baselines over multiple forecasting horizons of 1 h, 24 h, and 48 h timesteps using various metrics such as mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE). On the PM2.5 dataset of Delhi, STGCN-B achieves a performance of 10.53 MAE, 6.92 RMSE and 25.25 MAPE for a 1 h forecast, while STGCN-C achieves 20.18 MAE, 14.73 RMSE and 55.45 MAPE for a 24 h forecast. In general, both structures achieve similar results, with STGCN-C being better in many cases. They are further analysed through observation-prediction graphs and Taylor diagrams, which give an insight into our findings. The models are additionally validated on a benchmark real-world dataset from California, USA for better understanding of the spatio-temporal relations and model performances on a more stable dataset, where STGCN-C performs best for PM2.5 with 4.30 RMSE, 1.98 MAE, 25.96 MAPE for 1 h predictions for univariate data and 3.63 RMSE, 1.88 MAE and 25.91 MAPE in multivariate forecasting. The developed spatio-temporal graph-based models hold promising applications in urban air quality management, aiding policymakers in implementing targeted interventions to mitigate pollution-related health risks. Furthermore, these models can support public health agencies by providing timely and accurate forecasts of pollutant concentrations, enabling proactive measures to safeguard community well-being. Our study showcases the efficacy of spatio-temporal graph-based models in accurately forecasting air pollutant concentrations, with particular emphasis on short-term predictions. By leveraging multi-variate capabilities, our proposed models demonstrate superior performance compared to baseline approaches across various forecasting horizons.