Modeling PM2.5 and PM10 Using a Robust Simplified Linear Regression Machine Learning Algorithm

João Gregório,Pedro J S B Caridade,Carla Gouveia-Caridade

doi:10.3390/atmos13081334

Abstract

The machine learning algorithm based on multiple-input multiple-output linear regression models has been developed to describe PM2.5 and PM10 concentrations over time. The algorithm is fact-acting and allows for speedy forecasts without requiring demanding computational power. It is also simple enough that it can self-update by introducing a recursive step that utilizes newly measured values and forecasts to continue to improve itself. Starting from raw data, pre-processing methods have been used to verify the stationary data by employing the Dickey–Fuller test. For comparison, weekly and monthly decompositions have been achieved by using Savitzky–Golay polynomial filters. The presented algorithm is shown to have accuracies of 30% for PM2.5 and 26% for PM10 for a forecasting horizon of 24 h with a quarter-hourly data acquisition resolution, matching other results obtained using more computationally demanding approaches, such as neural networks. We show the feasibility of using multivariate linear regression (together with the small real-time computational costs for the training and testing procedures) to forecast particulate matter air pollutants and avoid environmental threats in real conditions.

Full Text