Abstract

ABSTRACT This paper addresses the problem of automated learning of air pollution predictive models that were trained using information gathered by a set of mobile low-cost sensors. Concretely, fast to compute machine learning methods (Decision Trees and Support Vector Machines) were used to build regression models that predict air pollution levels for a given location. The models were trained using the data collected by the OpenSense project, in particular, number of particulate matter, particle diameter, and lung deposited surface area (LDSA). We examined two different sets of attributes: one based on a geographical description of the location under analysis (e.g. distribution of households and roads), and another based on a time series of past air pollution observations in that location. Overall, we have found out that past measures lead to better pollution predictions. The best R2 score was 0.751 obtained with the model that predicts LDSA and was trained with the data set with time series attributes, while the worst R2 was 0.009 obtained with the geographical data set to predict number of particles. The performance of the best model is on par with similar air pollution systems. Moreover it can be used in a production system that requires frequent updates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call