Analysis and Forecasting of California Housing

Yucong Chen

doi:10.54097/hbem.v3i.4704

Abstract

House prices have significant impact on people’s daily life, and it is essential for people to have fixed abode, to live, work and social prosperity and stability. Hence predicting House price is a meaningful and big challenge. To achieve this goal, we use California Census dataset in this project to how distinctive features (attributes) can make the house price higher or lower. The main idea of this project is to build a Regression Model that can learn from this data and make predictions of the price of a house in any block, given some useful features provided in the datasets. In the regression task, we applied cross-validation and K-Fold method on Ridege Model, Random Forest, Gradient Boosting models to select the optimal hyperparameters. Then we apply the best selected model on test set, the results show decent performance for Random Forest and Gradient Boosting. The Random Forest performs the best with MSE (Mean Squared Error) 0.290, while it takes training time 14.7 seconds. Although the Gradient Boosting takes the result of MSE is 0.295, it took a shorter training time (2.91s).

Full Text