Fast Algorithms for Discovering Sequential Patterns in Massive Datasets

Dharani Dharani

doi:10.3844/jcssp.2011.1325.1329

Abstract

Problem statement: Sequential pattern mining is one of the specific data mining tasks, particularly from retail data. The task is to discover all sequential patterns with a user-specified minimum support, where support of a pattern is the number of data-sequences that contain the pattern. Approach: To find a sequence patterns variety of algorithm like AprioriAll and Generalized Sequential Patterns (GSP) were there. We present fast and efficient algorithms called AprioriAllSID and GSPSID for mining sequential patterns that were fundamentally different from known algorithms. Results: The proposed algorithm had been implemented and compared with AprioriAll and Generalized Sequential Patterns (GSP). Its performance was studied on an experimental basis. We combined the AprioriAllSID algorithm with AprioriAll algorithm into a Hybrid algorithm, called AprioriAll Hybrid. Conclusion: Implementation shows that the execution time of the algorithm to find sequential pattern depends on total no of candidates generated at each level and the time taken to scan the database. Our performance study shows that the proposed algorithms have an excellent performance over the best existing algorithms.

Highlights

We propose efficient algorithms namely AprioriAllSID and GSPSID to improve the performance by reducing the scale of the candidate item set Ck and the spending of I/O (Wang, 2010; Yong-Qing et al, 2009; Yang et al, 2009)
Algorithm AprioriAllSID: In Fig. 1, we present an efficient algorithm called AprioriAllSID, which is used to discover all sequential patterns in large customer database
Even though AprioriAllSID and GSPSID seems to bperformance comparison, we used the five different date nearly equal, for massive volume of data, the performance of AprioriAllSID and GSPSID will be for better than AprioriAll and Generalized Sequential Patterns (GSP) algorithms

Summary

INTRODUCTION

Algorithm AprioriAllSID: In Fig. 1, we present an efficient algorithm called AprioriAllSID, which is used to discover all sequential patterns in large customer database. Algorithm GSPSID: In Fig. 3, we propose an efficient algorithm called GSPSID, which is used to discover all generalized sequential patterns in large customer database. By using candidate-gen procedure with size-1 of frequent sequences gives the candidate sequence in C2 by iterating over the entries in C’2 and generates C’2 in step 6-11 of candidate sequence is assigned a unique number called its SID. Extensions: This field stores IDs of all the sequences Ck+1 obtained as an extension of Ck. s.set-of-sequence of C’k-1 gives the IDs of all the (k-1)-candidate sequence contained in transaction s.SID. We add Ck to Ct, by using this data structure we can efficiently stored and processed the candidate sequences

RESULTS

DISCUSSION

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer Science	Publication Date: Sep 1, 2011
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Fast Algorithms for Discovering Sequential Patterns in Massive Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science

Lead the way for us

Similar Papers

An efficient algorithm for mining sequential generator pattern using prefix trees and hash tables
Thi Thiet Pham ... Jiawei Luo
International Journal of Intelligent Systems Technologies and Applications | VOL. 13
Thi Thiet Pham, et. al.Thi Thiet Pham ... Jiawei Luo
01 Jan 2014
International Journal of Intelligent Systems Technologies and Applications | VOL. 13

An efficient model for information gain of sequential pattern from web logs based on dynamic weight constraint
Dhirendra Kumar Jha ... Anil Rajput
-
Dhirendra Kumar Jha, et. al.Dhirendra Kumar Jha ... Anil Rajput
01 Oct 2010
01 Oct 2010

Techniques for Understanding User Usage Behavior on the Internet
Abhijit R Joshi ... Aparna Ranade-Halbe
International Journal of Computer Applications | VOL. 92
Abhijit R Joshi, et. al.Abhijit R Joshi ... Aparna Ranade-Halbe
18 Apr 2014
International Journal of Computer Applications | VOL. 92

Efficiently Mining Sequential Generator Patterns Using Prefix Trees
Thi-Thiet Pham
Fundamenta Informaticae | VOL. 138
Thi-Thiet PhamThi-Thiet Pham
01 Jan 2015
Fundamenta Informaticae | VOL. 138

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Algorithms for Discovering Sequential Patterns in Massive Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science