Abstract

In this work, we provide a statistical procedure to integrate expert preferences towards explanatory variables in stepwise forward regression. The proposed method builds on the traditional stepwise linear regression and goal programming. The procedure is validated experimentally for real-life data from various sources aiming at predicting air pollution. The practical goal is to predict the annual concentrations of two health-related air pollutants, namely PM10 (Particulate Matter that is 10 micrometers or less in diameter) and NO2 (Nitrogen Dioxide). The main finding from this work is that inclusion of expert knowledge leads to more robust and accurate predictive models. Considering the limited size of data from air pollution monitoring stations, additional expert knowledge enabled to select most meaningful explanatory variables, and as the consequence the statistical inference lead to the improved predictions. The main contribution of this work is the proposed simple but solid expert-in-the-loop stepwise forward linear regression method allowing to include expert preferences. Experiments confirm that the proposed procedure is not only more interpretable but also delivers more accurate predictions for the considered air pollutants concentrations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call