Abstract

Problem statement: Sequential pattern mining is one of the specific data mining tasks, particularly from retail data. The task is to discover all sequential patterns with a user-specified minimum support, where support of a pattern is the number of data-sequences that contain the pattern. Approach: To find a sequence patterns variety of algorithm like AprioriAll and Generalized Sequential Patterns (GSP) were there. We present fast and efficient algorithms called AprioriAllSID and GSPSID for mining sequential patterns that were fundamentally different from known algorithms. Results: The proposed algorithm had been implemented and compared with AprioriAll and Generalized Sequential Patterns (GSP). Its performance was studied on an experimental basis. We combined the AprioriAllSID algorithm with AprioriAll algorithm into a Hybrid algorithm, called AprioriAll Hybrid. Conclusion: Implementation shows that the execution time of the algorithm to find sequential pattern depends on total no of candidates generated at each level and the time taken to scan the database. Our performance study shows that the proposed algorithms have an excellent performance over the best existing algorithms.

Highlights

  • We propose efficient algorithms namely AprioriAllSID and GSPSID to improve the performance by reducing the scale of the candidate item set Ck and the spending of I/O (Wang, 2010; Yong-Qing et al, 2009; Yang et al, 2009)

  • Algorithm AprioriAllSID: In Fig. 1, we present an efficient algorithm called AprioriAllSID, which is used to discover all sequential patterns in large customer database

  • Even though AprioriAllSID and GSPSID seems to bperformance comparison, we used the five different date nearly equal, for massive volume of data, the performance of AprioriAllSID and GSPSID will be for better than AprioriAll and Generalized Sequential Patterns (GSP) algorithms

Read more

Summary

INTRODUCTION

Algorithm AprioriAllSID: In Fig. 1, we present an efficient algorithm called AprioriAllSID, which is used to discover all sequential patterns in large customer database. Algorithm GSPSID: In Fig. 3, we propose an efficient algorithm called GSPSID, which is used to discover all generalized sequential patterns in large customer database. By using candidate-gen procedure with size-1 of frequent sequences gives the candidate sequence in C2 by iterating over the entries in C’2 and generates C’2 in step 6-11 of candidate sequence is assigned a unique number called its SID. Extensions: This field stores IDs of all the sequences Ck+1 obtained as an extension of Ck. s.set-of-sequence of C’k-1 gives the IDs of all the (k-1)-candidate sequence contained in transaction s.SID. We add Ck to Ct, by using this data structure we can efficiently stored and processed the candidate sequences

RESULTS
DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.