Abstract

In modern days to handle gargantuan amount of data, high computation system is needed for processing them in real time. High Performance Computing (HPC) system offers high computation power for processing big data efficiently. Data mining algorithms that deal with different types of data to discover hidden relationships and find complex patterns among them is a demanding field for the data scientists. As the size of the data grows, traditional algorithms, tools and techniques need some special computing environment so that they can work in real time. Sequential pattern mining is a crucial field for data mining that helps to analyze different essential data fields for finding sequence patterns. Prefixspan is one of the most efficient algorithms to find sequential patterns. However, in the sequential program of Prefixspan, high execution time is required to compute a big sequence database because of executing over a single processor. In the HPC system, multiple processors connected through a high-speed connection and they can compute the same task concurrently. Our goal is to minimize the execution time of Prefixspan using a Heterogeneous computing system. We proposed a method where two main tasks of Prefixspan can be implemented in a parallel way in GPU by using the NVIDIA CUDA platform. These main two tasks are to find frequent items and construct projected databases. This research shows a new way to build projected databases, only start index and ending index can be stored for a sequence in a single array. So, memory consumption can reduce in the HPC system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call