Quality-aware trajectory processing using significant locations

Han Su

doi:10.14264/uql.2015.699

Abstract

Driven by major advances in sensor technology, GPS-enabled mobile devices and wireless communications, a large amount of trajectory data is currently generated and managed in scores of application domains. This inspires a tremendous amount of research effort analyzing large scale trajectory data from a variety of perspectives in the last decade. However, people are still witnessing that data quality issues still persist in trajectory data and various kinds of trajectory-based services, mainly at 3 different levels: (1) the data level, e.g., heterogeneous and uncertain trajectory data; (2) the service level, e.g., the inability of capturing latent factors behind trajectory data; and (3) the representation level, e.g., the lack of semantic meaning in existing representation techniques of trajectory data. Such quality issues can be tied to the process and limited techniques that generate trajectory data, and the way that trajectory data is stored and presented. In this thesis , we tackle these quality issues in a systematic way using sampling of significant locations from all three levels. Below is a brief description of our contributions: Data level: We pioneer a systematic approach to \textit{trajectory calibration} that is a process to transform a heterogeneous trajectory dataset into one with (almost) unified sampling strategies. Trajectories in a practical database are always heterogeneous since a trajectory is a discrete approximation of the original continuous path, created by sampling the locations periodically, thus different sampling strategies result in a set of heterogeneous trajectory data. The heterogeneity of trajectory data has a negative impact on the effectiveness of similarity-based trajectory analysis, which is the foundation of most trajectory data processing tasks. Our solution was to take two steps for calibration: 1) the first step is to align the raw trajectories to a set of significant locations; 2) the second step is to interpolate several missing significant locations into the aligned trajectory. We have conducted extensive experiments based on a large-scale real trajectory dataset, which empirically demonstrates that the calibration system can significantly improve the effectiveness of most popular similarity measures for heterogeneous trajectories. Service Level: We propose the CrowdPlanner -- a crowd-based route recommendation system, which requests human workers to evaluate candidate routes recommended by different sources and methods, and this determines the best route based on their feedback[Feedback is an uncountable noun]. The route recommendation system is one of the most important trajectory-based applications. The routes recommended by the big-thumb service providers try to give users the best traveling experience according to criteria, such as traveling distance, traveling time, traffic condition, etc. However, previous research shows that even the routes recommended by the big-thumb service providers can deviate significantly from the routes traveled by experienced drivers. This then means that travelers' preferences on route selection are influenced by many latent and dynamic factors that are hard to model exactly with pre-defined formulas. So CrowdPlanner is used to leverage crowds' knowledge to improve the recommendation quality. In this system, two important components that affect system performance significantly are well designed: 1) the task generation component to efficiently generate tasks which are simple to answer; and (2) the worker selection component to quickly identify a set of appropriate domain experts to answer the questions in a timely and accurate way. We deployed the system and conducted extensive experiments with several workers, users and queries in real scenarios. The results demonstrate that CrowdPlanner can recommend the most satisfactory routes efficiently in most cases. Representation Level: We generate a short text to enhance the semantic meaning of the trajectory. A raw trajectory in the form of a sequence of timestamped locations does not make much sense for humans without semantic representation. So we have aimed to facilitate human's understanding of a raw trajectory by automatically generating a short text to describe it. By formulating this task as a problem of adaptive trajectory segmentation and feature selection, we propose a partition-and-summarization framework. In the partition phase, we first define a set of features for each trajectory segment and then derive an optimal partition, with the aim of making the segments within each partition as homogeneous as possible in terms of their features. In the summarization phase, for each partition, we select the most interesting features by comparing against the common behaviors of historical trajectories on the same route and we generate short text descriptions for these features. Comprehensive experiments were conducted, which empirically demonstrates that the generated textual descriptions can reflect the most significant features of trajectories and are easier for humans to understand.

Full Text