Mapping near-field pollutant concentration is essential to track plume dispersion in urban areas. By solving most of the turbulence spectrum, large-eddy simulations (LES) can accurately represent concentration spatial variability. Synthesizing this large amount of information to improve the accuracy of lower-fidelity operational models is particularly appealing. Due to the computational cost of LES, it becomes a challenge in multi-query contexts, to quantify how tracer dispersion changes with atmospheric and source parametric variations. We propose a data-driven reduced-order model (ROM) combining proper orthogonal decomposition (POD) and probabilistic Gaussian process regression (GPR) to efficiently predict tracer concentration field first- and second-order statistics. The novelty of this work lies in the original approach used to efficiently tune the GPR based on the available reduced-basis information and properties. In practice, the GPR hyperparameters are optimized through the hierarchy of POD modes thanks to Bayesian optimization informed by POD. The robustness of the approach is also addressed under small data constraint. A detailed analysis of our model performance is carried out in the case of a turbulent atmospheric boundary-layer flow over a surface-mounted obstacle for which the convective character of the system together with the uncertainty of the pollutant source location make it very challenging for ROM. The model is able to capture the wide range of spatial scales across the POD modes. The ROM accuracy remains acceptable for a minimal budget of about one hundred LES snapshots, providing a first estimation moving towards more realistic atmospheric dispersion applications.