A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

Dawen Xia,Huaqing Li,Yantao Li,Zili Zhang,Wendong Wang,Xiaonan Lu

doi:10.1155/2018/2818251

Abstract

Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.

Highlights

In the era of data technology (DT) with “Internet +” and “big data ×,” large-scale data has been growing rapidly with 5Vs characteristics (i.e., Volume, Velocity, Variety, Value, and Veracity) [1,2,3,4,5,6,7]
Based on the analysis mentioned above, it could be found that two different strategies of massive small file processing can be selected as follows: (I) Providing that the memory consumption is an important factor in controlling the performance of the calculation, the Sequence Files (SF) method should be selected in advance when we cope with massive small files on a Hadoop platform. (II) Providing that the execution efficiency of the entire mining process is considered, we should give priority to the CFIF method
We proposed a MapReduce-based Parallel Frequent Pattern growth (MR-Parallel FP-growth (PFP)) algorithm using massive small file processing strategies and applied it to analyze the spatiotemporal patterns of taxi operating characteristics with big trajectory data on a Hadoop distributed computing platform

Summary

Introduction

In the era of data technology (DT) with “Internet +” and “big data ×,” large-scale data has been growing rapidly with 5Vs characteristics (i.e., Volume, Velocity, Variety, Value, and Veracity) [1,2,3,4,5,6,7]. Taxi trajectory data is becoming one of the most significant data sources of mobile trajectory data. Taxi trajectory data records the movement traces and operating status of taxicabs, and to a certain extent it reflects the urban transportation conditions and includes the potentially rich driving experience of drivers [13, 14]. Traffic conditions are extremely uncertain in a transportation network due to the heterogeneous and dynamic nature of traffic with nonlinear interactions between drivers and environments [13].

Objectives

Methods

Results

Conclusion