The development of reliable predictive algorithm for house price as the housing market is a stand-out among the most involved regarding valuing the price and continues to fluctuate, is constantly a need for socio-economic advancement and welfare of citizen. In this paper, we develop machine learning algorithms for forecasting UK housing Price, and find an optimal algorithm that forecasts housing price accurately on the premises of the presence of many features or covariates. After applying correlation analysis to remove correlated variables in order to avoid multicollinearity, thereby increasing the statistical power, a novel method of using regression analysis to first of all understand and select statistically significant features for the various regions in England based on North South divide is adopted. These features are then used in the machine learning algorithm to further increase the statistical power of the algorithm, increase the level of accuracy for each of them and ultimately increase the predictive values for the algorithms.The model construction involves 3 stages: 1- correlation analysis to identify and remove correlated variables thereby avoiding multicollinearity and increasing the statistical power of the linear regression, 2 - using linear regression to determine variables that are statistically significant and 3 - building the machine learning algorithms based on the variables that are statistically significant from the linear regression. A comprehensive dataset of UK Paid housing Price from 2010 to 2019 was linked to a number of other datasets to generate a total 21 variables or features used for the models. Catboost, Gradient Boosting, Bagging, Random Forest, Extra Tree all achieved the excellent model’s performance result in all the regions considered. The comparison of the seven models showed that Extra Tree algorithm consistently achieved the best performance in term of level of accuracy in all the regions. K-Nearest Neighbours (KNN) is the only algorithm with less than 50% level of accuracy. Noticeably, the regions considered had varying or differing insignificant variables, implying that although many variables are common (statistically significant) to all the regions, there are regional differences and impact when modelling or predicting housing prices. This study validates the practicability of developing a machine learning methodology for the prediction of housing price. This research offers a reference for future house price prediction based on machine learning.© 2024 The Author(s). Published by RITHA Publishing. This article is distributed under the terms of the license CC-BY 4.0., which permits any further distribution in any medium, provided the original work is properly cited.
Read full abstract