Accurately estimating the prices of houses is important for various stakeholders including house owners, real estate agencies, government agencies, and policy-makers. Towards this end, traditional statistics and, only recently, advanced machine learning and artificial intelligence models are used. Open Government Data (OGD) have a huge potential especially when combined with AI technologies. OGD are often published as linked data to facilitate data integration and re-usability. EXplainable Artificial Intelligence (XAI) can be used by stakeholders to understand the decisions of a predictive model. This work creates a model that predicts house prices by applying machine learning on linked OGD. We present a case study that uses XGBoost, a powerful machine learning algorithm, and linked OGD from the official Scottish data portal to predict the probability the mean prices of houses in the various data zones of Scotland to be higher than the average price in Scotland. XAI is also used to globally and locally explain the decisions of the model. The created model has Receiver Operating Characteristic (ROC) AUC score 0.923 and Precision Recall Curve (PRC) AUC score 0.891. According to XAI, the variable that mostly affects the decisions of the model is Comparative Illness Factor, an indicator of health conditions. However, local explainability shows that the decisions made in some data zones may be mostly affected by other variables such as the percent of detached dwellings and employment deprived population.
Read full abstract