Abstract

Measuring timely high-resolution socioeconomic outcomes is critical for policy making and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This paper demonstrates an adaptive way to measure the time trend and spatial distribution of housing activeness with the help of multiple easily accessible datasets. We first identified the regional activeness status from energy consumption data and then matched it with nightlight and land use data. We introduce the factor-adjusted regularization methods for prediction (FarmPredict) to deal with dependence and collinearity issues among predictors by effectively lifting the prediction space. It applies to all machine learning algorithms. The heterogeneity of big data is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 75% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with only the requirement of publicly accessible data. Our paper demonstrates the power of machine learning in understanding socioeconomic outcomes when the census and survey data are costly or unavailable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call