Abstract

The paper deals with the analysis of spatial distribution of Swiss population using fractal concepts and unsupervised learning algorithms. The research methodology is based on the development of a high dimensional feature space by calculating local growth curves, widely used in fractal dimension estimation and on the application of clustering algorithms in order to reveal the patterns of spatial population distribution. The notion “unsupervised” also means, that only some general criteria—density, dimensionality, homogeneity, are used to construct an input feature space, without adding any supervised/expert knowledge. The approach is very powerful and provides a comprehensive local information about density and homogeneity/fractality of spatially distributed point patterns.

Highlights

  • The spatial distribution of the population depends on many environmental, social, and economic factors

  • The research follows a coherent methodology which consists of several important steps: advanced exploratory data analysis, including generation of simulated data in a validity domain, embedding of original data into a feature space composed of local growth curves, study of the clusterability, selection and calibration of the unsupervised learning algorithms, understanding and qualitative interpretability of the results

  • High resolution data, considered as a point process were embedded into a 25-dimensional space via local growth curves, which were computed from 300 to 10000 meters distances

Read more

Summary

Introduction

The spatial distribution of the population depends on many environmental, social, and economic factors. The research follows a coherent methodology which consists of several important steps: advanced exploratory data analysis, including generation of simulated data in a validity domain, embedding of original data into a feature space composed of local growth curves, study of the clusterability (clustering tendency), selection and calibration of the unsupervised learning algorithms, understanding and qualitative interpretability of the results. One possibility to deal with this problem, is to apply different correction factors, taking into account, for example, the shape of the region under study [1,2,3] Another possibility is to generate a CSR (complete spatial randomness) pattern in the validity domain and compare the results with the raw data. Let us present a simulated data set, called CHCSR (Swiss Complete Spatial Randomness pattern) It was generated with the same number of points like in the raw population data within the boundary of Switzerland with the exclusion of the internal lakes. The considered two data sets are embedded into a high dimensional feature space, where unsupervised algorithms are applied, following the methodology described below

Methodology and methods
Discussion and conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call