Location-Based Parallel Sequential Pattern Mining Algorithm

Byoungwook Kim,Gangman Yi

doi:10.1109/access.2019.2939937

Abstract

Given a data sequence, sequential pattern mining, which finds frequent sequence patterns among them, is an important data mining problem. However, in the existing sequential pattern mining, only the purchase order of the items is considered, and the position where the item is purchased is not considered. In this paper, we developed a sequential pattern mining algorithm using Apache spark. The proposed algorithm finds frequent sequential patterns in parallel by distributing data to several machines. Experimentally, we performed a comprehensive performance study on the proposed algorithm by varying various parameter values using various synthetic data. Experimental results show that the proposed algorithm shows a linear speed improvement over the number of machines.

Highlights

The development of IT technology and the computer and internet industries has increased the need to handle large amounts of data in modern society
We propose location-based sequential pattern mining algorithm based on PrefixSpan to handle location data
EXPERIMENTS we evaluate the performance of our proposed two sequential pattern mining algorithms, Naïve Location-based PrefixSpan (NLPS), and the MapReduce Location-based PrefixSpan (MRLPS)

Summary

INTRODUCTION

The development of IT technology and the computer and internet industries has increased the need to handle large amounts of data in modern society. We have developed a sequential pattern mining algorithm that considers the purchase location using MapReduce programming model on Hadoop distributed environment. Problem: Given a database that contains m location-based sequences and a specified minimum support δ, the problem is to find all set of sequential patterns in the database. Most of them were sequential pattern candidates generated by apriori-style method This approach had to tally a set of many candidate sequence patterns, and had to scan the database multiple times to find long-length sequential patterns. To solve this problem, PrefixSpan (Prefix-projected Sequential pattern mining) algorithm has been proposed. The support count of β in α-projected database D|r, denoted as supportD|r (β), is the number of sequences γ in D|r

ALGORITHMS

NAÏVE APPROACH

EXPERIMENTS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Location-Based Parallel Sequential Pattern Mining Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

An efficient model for information gain of sequential pattern from web logs based on dynamic weight constraint
Dhirendra Kumar Jha ... Archana Tomar
-
Dhirendra Kumar Jha, et. al.Dhirendra Kumar Jha ... Archana Tomar
01 Oct 2010
01 Oct 2010

An efficient info-gain algorithm for finding frequent sequential traversal patterns from web logs based on dynamic weight constraint
Rahul Moriwal ... Vijay Prakash
-
Rahul Moriwal, et. al.Rahul Moriwal ... Vijay Prakash
03 Sep 2012
03 Sep 2012

Design and Implementation of an Algorithm for Finding Frequent Sequential Traversal Patterns from Web Logs Based on Weight Constraint
Mahendra Singh Sisodia ... Mayank Pathak
-
Mahendra Singh Sisodia, et. al.Mahendra Singh Sisodia ... Mayank Pathak
01 Jan 2009
01 Jan 2009

Development of Sequential ID3: “An advance Sequential mining Algorithm”

-

23 Jan 2014
23 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Location-Based Parallel Sequential Pattern Mining Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access