Abstract

As users of mobile devices make phone calls, browse the web, or use an app, large volumes of data are routinely generated that are a potentially useful source for investigating human behavior in space. However, as such data are usually collected only as a by-product, they often lack stringent experimental design and ground truth, which makes interpretation and derivation of valid behavioral conclusions challenging. Here, we propose an unsupervised, data-driven approach to identify different user types based on high-resolution human movement data collected from a smartphone navigation app, in the absence of ground truth. We capture spatio-temporal footprints of users, characterized by meaningful summary statistics, which are then used in an unsupervised step to identify user types. Based on an extensive dataset of users of the mobile navigation app Sygic in Australia, we show how the proposed methodology allows to identify two distinct groups of users: ‘travelers’, visiting different areas with distinct, salient characteristics, and ‘locals’, covering shorter distances and revisiting many of their locations. We verify our approach by relating user types to space use: we find that travelers and locals prefer to visit distinct, different locations in the Australian cities Sydney and Melbourne, as suggested independently by other studies. Although we use high-resolution GPS data, the proposed methodology is potentially transferable to low-resolution movement data (e.g. Call Detail Records), since we rely only on summary statistics.

Highlights

  • Today, a large part of data capturing human spatial mobility and behavior is being generated as byproducts of digital or online activities, for instance, during mobile phone use (Csáji et al [9], Ahas et al [1]) or as a by-product of taxi dispatching systems (Gong et al [18])

  • How can we reliably characterize subgroups of a moving population in the absence of verifiable ground truth data? In this article, we present a fully data-driven approach to identify distinct subgroups of moving populations based on exhaust human movement data (EHMD) collected from a mobile navigation app

  • We only label and interpret in detail the two most salient Clusters 2 and 3. We argue that this is an important aspect of unsupervised learning: five clusters best capture the variance in the spatio-temporal footprints, an analyst should not expect that all clusters are semantically meaningful (Clusters 1 and 4) or that all clusters show salient behavior that can be interpreted in a straight-forward way (Cluster 5)

Read more

Summary

Introduction

A large part of data capturing human spatial mobility and behavior is being generated as byproducts of digital or online activities, for instance, during mobile phone use (Csáji et al [9], Ahas et al [1]) or as a by-product of taxi dispatching systems (Gong et al [18]). We compute a set of meaningful behavioral spatio-temporal features for each user with a spatio-temporal footprint (feature extraction) These features do not relate to a specific absolute location in space and time, but rather describe the relative movement of a user, for example the average extent of the area a user has covered in a single day. We perform a principal component analysis (PCA) to single out the most informative combi- In this proposed methodology, the feature extraction and dimensionality reduction steps make explicit human behavior that is implicitly hidden in exhaust human movement data, while the unsupervised learning by clustering allows us to draw inferences and interpretation without ground truth. We have designed an iterative data-driven approach to identify an optimal clustering method with an optimal number of clusters and an optimal number of principal components to identify user types in the data

Variation of stop duration
Hourly Daily Hourly Daily
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call