This paper employs machine learning to quantify the value of “soft” information contained in real estate property descriptions. Textual descriptions contain information that traditional hedonic attributes cannot capture. A one standard deviation increase in the uniqueness of a property based on this “soft” information leads to a 15% increase in property sale price in a hedonic price model and a 10% increase in a repeat sales price model. The effects in the hedonic model appear to arise through two channels: the unobserved quality of the housing unit, and the market power of the housing unit relative to competing properties. The effects in the repeat sales model appear to be driven entirely by the market power of the unit. Further, an annual hedonic price index ignoring our measure of unobserved quality overstates real estate prices by between 10% to 23% and mistimes the stabilization of housing prices following the Great Recession. Similar, but smaller effects, are observed for the repeat sales price index.
Read full abstract