Abstract

We present and test a sequential learning algorithm for the prediction of human mobility that leverages large datasets of sequences to improve prediction accuracy, in particular for users with a short and non-repetitive data history such as tourists in a foreign country. The algorithm compensates for the difficulty of predicting the next location when there is limited evidence of past behavior by leveraging the availability of sequences of other users in the same system that provide redundant records of typical behavioral patterns. We test the method on a dataset of 10 million roaming mobile phone users in a European country. The average prediction accuracy is significantly higher than that of individual sequence prediction algorithms, primarily constant order Markov models derived from the user’s own data, that have been shown to achieve high accuracy in previous studies of human mobility. The proposed algorithm is generally applicable to improve any sequential prediction when there is a sufficiently rich and diverse dataset of sequences.

Highlights

  • The problem of algorithmic prediction of human mobility has received significant attention in the literature in recent years, for its potential applications and its inherent theoretical value

  • The data was provided by a major telecom operator and consists of an anonymised sample of seven months of more than 10 million roamers’ call detail records (CDR) in a European country

  • Each CDR contains the principal antenna that a mobile device is connected to during a phone call, SMS communication or data connection

Read more

Summary

Introduction

The problem of algorithmic prediction of human mobility has received significant attention in the literature in recent years, for its potential applications and its inherent theoretical value. In the first class we include algorithms such as Markov models that use only the single user’s past locations, without any other information, to estimate the location. This individual sequence prediction is closely related to lossless compression of sequential data [1,2,3]. When an agent is added to the system, a prediction algorithm utilizing only the past data of this agent to produce predictions faces the problem of having to wait enough time until a statistically significant amount of data has been accumulated, before reliable predictions can be produced. When the sequence of states of an agent is not stationary (the patterns of states of the agent cannot be modeled by a probability distribution function immutable in time), a PLOS ONE | DOI:10.1371/journal.pone.0170907 January 30, 2017

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.