Probabilistic Grid-Based Approaches for Privacy-Preserving Data Mining on Moving Object Trajectories

Győző Gidófalvi ,Xuegang Huang ,Torben Bach Pedersen

doi:10.1201/b10373-17

Abstract

The efficient management of moving object databases has gained much interest in recent years due to the development of mobile communication and positioning technologies. A typical way of representing moving objects is to use the trajectories. Much work in the database community has focused on the topics of indexing, query processing and data mining of moving object trajectories, but little attention has been paid to the preservation of privacy in this setting. In many applications such as intelligent transport systems (ITS) and fleet management, floating car data (FCD), i.e., tracked vehicle locations, are collected, and used for mining traffic patterns. For instance, by mining vehicle trajectories in urban transportation networks over time one can easily identify dense areas (roads, junctions, etc.), and use this knowledge to predict traffic congestion. By data mining the periodic movement patterns (objects follow similar routes at similar times) of individual drivers, personalized, context–aware services can be delivered. However, exposing location / trajectory data of moving objects to application servers can cause threats to the location privacy of individual users. For example, a service provider with access to trajectory data can study a user’s personal habits. The naive approach of keeping the user’s identity a secret by hiding / encoding the user’s ID does not work: Frequent user locations, such as the home and office addresses can be found by first self–correlating the user’s trajectory, and then cross–referencing the frequent locations with publicly available spatial data sources, e.g., Yellow Pages, thereby revealing the user’s identity. In recent years, the study of privacy–preserving data mining has appeared due to the advances in data collection and dissemination technologies which force existing data mining algorithms to be reconsidered from the point of view of privacy protection. Various privacy concepts and measures, such as k–anonymity and l–diversity, and related privacy–preservation techniques, such as perturbation, condensation, generalization and data hiding with conceptual reconstruction have been proposed in the general setting. However, their extension or applicability to the spatio–temporal domain, in particular the privacy–preserving data mining of moving object trajectories has not been investigated. Hence the chapter is focused on addressing the unique challenge of obtaining detailed, accurate patterns from anonymized location and trajectory data. After a thorough status report on research works related to the issue of privacy–preserving data mining on moving object trajectories, first, the chapter proposes a novel anonymization model for preservation of location privacy on moving object trajectories. In this model, users specify their requirements of location privacy, based on the notions of anonymization rectangles and location probabilities, intuitively saying how precisely they want to be located in given areas. Second, the chapter shows a common problem with existing methods that are based on the notion of k–anonymity. This problem allows an adversary to infer a frequently occurring location of a user, e.g., the home address, by correlating several observations. Third, the chapter presents an effective grid–based framework for data collection and mining over the anonymized trajectory data. The framework is based on the notions of anonymization grids and anonymization partitionings which allow effective management of both the user–specified location privacy requirements and the anonymized trajectory data. Along with the framework, three policies for constructing anonymization rectangles, called common regular partitioning, individual regular partitioning, and individual irregular partitioning are presented. All three policies avoid the aforementioned privacy problems of existing methods. Fourth, the chapter presents a client–server architecture for an efficient implementation of the system. A distinguishing feature of the architecture is that anonymization is performed solely on the client, thus removing the need for trusted middleware. Fifth, the chapter presents techniques for solving two basic trajectory data mining operation, namely finding dense spatio–temporal areas and finding frequent routes. The techniques are based on probabilistic counting. Finally, extensive experiments with prototype implementations show the effectiveness of the approach, by comparing the presented solutions to their non–privacy–preserving equivalents. The experiments show that the framework still allows most patterns to be found, even when privacy is preserved.

Full Text