Monitoring particulate matter (PM) air pollution in terms of both concentration and composition, is very important due to its effects on human health and climate. In the PRIMARY project we aim at retrieving the aerosol composition from space using the hyperspectral observations from the Italian Space Agency's PRISMA mission. To this end, we are developing a machine learning algorithm trained with synthetic top-of-atmosphere reflectances and underlying aerosol fields. As part of this process, we plan to use the global forecasts from the Copernicus Atmosphere Monitoring Service (CAMS) as the core to generate this synthetic dataset. However, to proceed in this direction, a preliminary assessment of the reliability of this model-based dataset when compared to observations is necessary, also to bias correct the output if needed. With this aim, we assess the representation of the aerosol chemical composition and the related optical properties at selected globally distributed sites in CAMS, comparing the simulations with near-surface aerosol chemical analyses from the SPARTAN network and column sun-photometer observations from the AERONET network. We found that CAMS forecasts skills changed over time due to updates in the modelling system, with the latter two version cycles (46 and 47) being similar. Generally, they reproduce the aerosol composition within a factor of 2. We found a substantial overestimation of organic matter (OM) by a factor of 3. Applying a correcting factor to OM (constant at the global level) warrants a much more realistic representation of PM2.5 total mass and relative fraction of single species in CAMS. From the so derived CAMS aerosol-speciated profiles, we calculate aerosol optical properties, needed for subsequent use in a radiative transfer model. Comparison against AERONET indeed shows that OM bias correction resulted in improvements in Extinction Ångstrom Exponent (α440nm870nm). Aerosol Optical Depth (AOD), Single Scattering Albedo (SSA) and Asymmetry Parameter (g) simulations resulted slightly degraded, confirming the possibility of using CAMS as the base for a synthetic retrieval training dataset.
Read full abstract