A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification

Yunzhe Liu,Alex Singleton,Daniel Arribas-Bel

doi:10.1080/10095020.2019.1621549

Yunzhe Liu, Alex Singleton + Show 1 more

Open Access

https://doi.org/10.1080/10095020.2019.1621549

Copy DOI

Journal: Geo-spatial Information Science	Publication Date: Jun 5, 2019
Citations: 32	License type: open-access

Affiliation: University of Liverpool

Abstract

ABSTRACT A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography. However, such representations are influenced by the methodological choices made during their construction. Of particular debate are the choice and specification of input variables, with the objective of identifying inputs that add value but also aim for model parsimony. Within this context, our paper introduces a principal component analysis (PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables. The proposed methodology is exemplified in the context of variables from the UK 2011 Census, and its output compared to the Office for National Statistics 2011 Output Area Classification (2011 OAC). Through the implementation of the proposed methodology, the quality of the cluster assignment was improved relative to 2011 OAC, manifested by a lower total within-cluster sum of square score. Across the UK, more than 70.2% of the Output Areas (OAs) occupied by the newly created classification (i.e. AVS-OAC) outperform the 2011 OAC, with particularly strong performance within Scotland and Wales.

Highlights

A geodemographic classification aims to summarise the multidimensional socio-economic and built characteristics of small area zonal geography, and are often referred as “neighbourhood” classification (Harris, Sleight, and Webber 2005)
Accepting of arguments that principal component analysis (PCA) can have an adverse effect when used to create inputs to a geodemographic classification (Alelyani, Tang, and Liu 2014; Harris, Sleight, and Webber 2005; Leventhal 2016), we would argue that PCA can still have utility as a tool in the identification of appropriate input variables; which is the basis of the method we introduce in the remainder of this section
The automated variable selection process presented in the previous section was implemented in an example of building a UK census geodemographics that would be broadly comparable to 2011 OAC

Summary

Introduction

A geodemographic classification aims to summarise the multidimensional socio-economic and built characteristics of small area zonal geography, and are often referred as “neighbourhood” classification (Harris, Sleight, and Webber 2005). Clustering performance can be hugely improved through the reduction of the number of variables due to this “curse of dimensionality” (Alelyani, Tang, and Liu 2014; Guyon and Elisseeff 2003; Pacheco 2015; Rojas 2015) Taking such perspectives into consideration, a typical objective of variable selection is to achieve input parsimony, that is, the identification of the smallest subset of input variables that capture the most variation within the original dataset (Debenham 2002; Gale et al 2016; Harris, Sleight, and Webber 2005).

Selecting variables in national classifications

Automated variable selection using PCA

Case study application

Describing the derived classification

Classification performance and comparison to 2011 OAC

Conclusions

Findings

Notes on contributors

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geo-spatial Information Science

Lead the way for us

Similar Papers

Geodemographic travel to work flows into London, UK
Chris Gale ... David Martin
International Conference on GIScience Short Paper Proceedings | VOL. 1
Chris Gale, et. al.Chris Gale ... David Martin
01 Jan 2015
International Conference on GIScience Short Paper Proceedings | VOL. 1

OACoder: Postcode Coding Tool
...
Journal of Open Research Software | VOL. 1
, et. al. ...
08 Oct 2013
Journal of Open Research Software | VOL. 1

An Open Source Geodemographic Classification of Small Areas in the Republic of Ireland
Christopher Brunsdon ... Janette E Rigby
Applied Spatial Analysis and Policy | VOL. 11
Christopher Brunsdon, et. al.Christopher Brunsdon ... Janette E Rigby
29 Oct 2016
Applied Spatial Analysis and Policy | VOL. 11

The internal structure of Greater London: a comparison of national and regional geodemographic models
Alex David Singleton ... Paul Longley
Geo: Geography and Environment | VOL. 2
Alex David Singleton, et. al.Alex David Singleton ... Paul Longley
01 Jun 2015
Geo: Geography and Environment | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geo-spatial Information Science