Abstract

ABSTRACTThe hedonic price regressions have mainly been used for inference. In contrast, machine learning employed on big data has a great potential for prediction. To contribute to the integration of these two strategies, this article proposes a machine learning approach to the regression analysis of big data, viz. real estate prices, for both inferential and predictive purposes. The methodology incorporates a new procedure of selecting variables, called ‘incremental sample with resampling’ (MINREM). The methodology is tested on two cases. The first is data from web advertisements selling used homes in Colombia (61,826 observations). The second considers the data (58,888 observations) from a sample of the Metropolitan American Housing Survey 2011 obtained and prepared by a reference study. The methodology consists of two stages. The first chooses the important variables under MINREM; the second focuses on the traditional training and validation procedure for machine learning, adding three activities. In both test cases, the methodology shows its value for obtaining highly parsimonious and stable models for different sample sizes, as well as taking advantage of the inferential and predictive use of the obtained regression functions. This paper contributes to an original methodology for big data regression analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call