Poor air quality produces detrimental effects worldwide; hence, it is vital to thoroughly characterize air pollution sources and effectively address and mitigate such effects. Due to the complexity of the underlying physical processes, and uncertainties in the available observations, the air pollution source identification problem is typically cast within a Bayesian inversion framework. The latter incorporates prior knowledge and observations to characterize a release event through the posterior distribution of the source parameters. In this study, we rely on two-dimensional (2D) pollutant concentration distributions as observations, and adopt the Wasserstein (W2) distance to model the likelihood probability distribution for given emission parameters. Since the posterior distribution is estimated via random sampling that involves many forward model runs, the Bayesian framework can be computationally prohibitive for realistic urban air pollution problems that are driven by computationally demanding micro-scale flow simulations. Furthermore, computing the W2 distance is resource-intensive . In this context, we develop a computationally efficient Bayesian framework by following (i) a two-stage approach that reduces the cost of the Bayesian inversion, and (ii) an artificial intelligence (AI) approximation of the W2 distance. In the two-stage approach, a low-resolution dispersion model is run in the first stage to propose representative samples of emission parameters for final selection by the original high-resolution model in the second stage. In addition, we approximate the W2 distance using a deep neural network (DNN) to achieve an appreciable reduction in the computational cost with negligible loss in the inversion performance. We design numerical experiments to test the sensitivity of the inverse solution to the characteristics of the approximative model. The results indicate that pairing the two-stage approach with the DNN approximation of the W2 distance preserves the quality of the inverse solution, while achieving at least threefold reductions in the computational cost.