Abstract
The availability of reliable socioeconomic data is critical for the design of urban policies and the implementation of location-based services; however, often, their temporal and geographical coverage remain scarce. We explore the potential for insurance customers data to predict socioeconomic indicators of Swiss municipalities. First, we define a features space by aggregating at city-level individual customer data along several behavioral and user profile dimensions. Second, we collect official statistics shared by the Swiss authorities on a wide spectrum of categories: Population, Transportation, Work, Space and Territory, Housing, and Economy. Third, we adopt two spatial regression models exploring both global and local geographical dependencies to investigate their predictability. Results show consistently a correlation between insurance customer characteristics and official socioeconomic indexes. Performance fluctuates depending on the category, with values of R2 > 0.6 for several target variables using a 5-fold cross validation. As a case study, we focus on predicting the percentage of the population using public transportation and we discuss the implications on a regional scope. We believe that this methodology can support official statistical offices and it could open up new opportunities for the characterization of socioeconomic traits at highly-granular spatial and temporal scales.
Highlights
National Statistical Institutes (NSIs) play an important role in modern societies to release precise information on social, environmental or economical activities [1] in the form of a census
To reduce model complexity and to prevent overfitting, LassoLarsIC adopts the Least Absolute Shrinkage and Selection Operator [43] (LASSO) model for fit and it relies on the Least Angle Regression [44] (LARS) and the Bayes Information Criterion [45] (BIC) for model selection, trying to find the right trade-off between fitting performance and the complexity of the model
After the analysis of determinants, we focus on comparing the performance of the global (SLM) and local (GWR) spatial models to a standard multivariate linear regressor (OLS) to quantify to benefit of exploiting spatial relations
Summary
National Statistical Institutes (NSIs) play an important role in modern societies to release precise information on social, environmental or economical activities [1] in the form of a census. The census records key aspects such as the population living in an area, their age, gender, income, and it enables predictive scenarios to estimate the need for schools, residential homes or public services. Official statistics on socioeconomic status are increasingly addressing a significant modernization of their production process, nationally and internationally [3]. This is due to the opportunities offered by the use of new data sources, such as mobile phone data [4], social media [5], satellite.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.