A Machine Learning Approach to Price Indices: Applications in Real Estate

Felipe Dutra Calainho,Marc Francke,Alex Van De Minne

doi:10.2139/ssrn.3689632

Abstract

This paper proposes a methodology for using machine learning regression models to create price indices. In our study we developed six commercial real estate price indeces for the city of New York from year 2000 to 2019. The regression models used in this study are eXtreme Gradient Boosting Tree (XGBT), Support Vector Regression (SVR) and averaged Neural Networks (avNNet). The benchmark for comparison of the results is Ordinary Least Squares (OLS). There are two main index methodologies, a chained, where the index is constructed using out-of-sample data, and a pooled, where the index is constructed using in-sample data. The two main index methodologies can be divided into sub categories that utilizes a Paasche and Laspeyres like index formulations. Another important factor included in this study is the size, in years, of the optimal training window for building the index. The results show that the machine learning approach produced, overall, lower estimation errors. Nevertheless, these lower estimation errors not always constitutes a more stable index. Additionally, all models, including the benchmark, are sensitive to the training window size regarding the out-of-sample estimation error.

Full Text