Abstract

Discovering periodic-frequent patterns in temporal databases is a challenging problem of great importance in many real-world applications. Though several algorithms were described in the literature to tackle the problem of periodic-frequent pattern mining, most of these algorithms use the traditional horizontal (or row) database layout, that is, either they need to scan the database several times or do not allow asynchronous computation of periodic-frequent patterns. As a result, this kind of database layout makes the algorithms for discovering periodic-frequent patterns both time and memory inefficient. One cannot ignore the importance of mining the data stored in a vertical (or columnar) database layout. It is because real-world big data is widely stored in columnar database layout. With this motivation, this paper proposes an efficient algorithm, Periodic Frequent-Equivalence CLass Transformation (PF-ECLAT), to find periodic-frequent patterns in a columnar temporal database. Experimental results on sparse and dense real-world and synthetic databases demonstrate that PF-ECLAT is memory and runtime efficient and highly scalable. Finally, we demonstrate the usefulness of PF-ECLAT with two case studies. In the first case study, we have employed our algorithm to identify the geographical areas in which people were periodically exposed to harmful levels of air pollution in Japan. In the second case study, we have utilized our algorithm to discover the set of road segments in which congestion was regularly observed in a transportation network.

Highlights

  • Depending on the layout of recording data on a storage device, databases can be broadly classified into two types, namely row databases and columnar databases

  • Row databases are suitable for online transaction processing (OLTP), while columnar databases are suitable for online analytical processing (OLAP)

  • We propose a novel and generic Equivalence CLass Transformation (ECLAT) algorithm to find all periodic-frequent patterns in a columnar temporal database

Read more

Summary

Introduction

Depending on the layout of recording data on a storage device, databases can be broadly classified into two types, namely row databases and columnar databases (row and columnar databases are referred to as horizontal and vertical databases, respectively). Row databases organize data as records, keeping all of the data associated with a record next to each other in a storage device These databases are mostly based on ACID (ACID stands for Atomicity, Consistency, Isolation, and Duration) properties and are optimized for reading and writing rows efficiently. Columnar databases organize data in the form of fields, keeping all of the data associated with a field next to each other in a storage device. These databases are mostly based on BASE (BASE stands for Basically Available, Soft state, and Eventually consistent) properties and are optimized for reading and computing on columns efficiently. For the pattern mining algorithms, the columnar databases provide an advantage of cutting off several costly operations such as pattern support counting and candidate pruning as these operations become simple binary operations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call