This study applied two data mining tasks: clustering and association rules to a dataset of pollutants in the state of São Paulo. The clustering task was applied to temporal patterns and geospatial distributions of pollutants, and the association rules were used to identify prevailing meteorological conditions when there were high concentrations of pollutants from 2017 to 2019. The results indicated good adequacy of the cluster, indicating different pollution levels per group, with a silhouette coefficient from 0.26 to 0.72. In the spatial evaluation, the groups severely polluted were located in the metropolitan region, on the coast and, some inland cities, by industrial, vehicular, burning, agriculture, and other emissions. The cluster identified a strong presence of O3 and PM2.5 in 65% and 72% of the monitored stations in several areas of the state. As for the distance between the sources of pollution, the groups of PM10 and NO2 were geographically distant, while PM2.5, CO, SO2, and O3 were closer, suggesting a spatial relationship of exposure. Seasonality was similar between groups, with significantly higher concentrations in winter, except for O3, for which higher concentrations occurred in summer. Meteorological conditions contributed to critical episodes of pollution (support and confidence greater than 80%), with low temperature and humidity, low rainfall, and milder wind associated with increased pollutants. In conclusion, investigating spatial representativeness allows revealing spatial and temporal patterns of pollutants and unfavorable meteorological conditions to diffusion. Thus, ideal and effective measures can be taken to avoid critical periods of exposure based on the behavior of pollutants in different regions and related climate changes.
Read full abstract