An Efficient Method for Mining Top-K Closed Sequential Patterns

Thi-Thiet Pham,Tung Do,Bay Vo,Tzung-Pei Hong,Anh Nguyen

doi:10.1109/access.2020.3004528

Thi-Thiet Pham, Tung Do + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3004528

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs efficiently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.

Highlights

In the domain of data mining from a sequence database, exploiting sequential patterns is an essential task that has been extensively examined [1], [3], [4], [8]–[11], [14], [17], [23], [27], [35]
RUNTIME Figures 2-7 shown the runtimes for the TKCS algorithm and TSP algorithm for mining the top-k Closed Sequential Patterns (CSPs). Based on these experimental results, the runtime of the TKCS algorithm is much faster than that of the TSP algorithm in all the databases and with different numbers of k-CSPs, especially when the user chooses a larger number for k and there is a large sequence database with many items
The top-k CSPs problem is explored in this work by increasing the minsup values to fit the sequential-pattern mining algorithms in order to generate the exact amount of CSPs desired by the user

Summary

INTRODUCTION

In the domain of data mining from a sequence database, exploiting sequential patterns is an essential task that has been extensively examined [1], [3], [4], [8]–[11], [14], [17], [23], [27], [35]. AprioriAll [1] was the first algorithm designed to solve the sequential pattern mining problem It was proposed by Agrawal et al in 1995 and is the basis for later algorithms such as GSP [27], SPADE [35], SPAM [3], FREESPAN [12], PREFIXSPAN [23], PRISM [11], and MCM-SPADE [14]. The algorithms for exploiting sequential patterns or CSPs from a sequence database mentioned above always require a minimum support threshold by the user. The problem of increasing the minsup value to fit sequential-pattern mining algorithms is to automatically adjust the parameter to generate the exact amount of CSPs desired by users.

RELATED WORK

TKCS ALGORITHM

EXPERIMENTAL RESULTS

CONCLUSIONS

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 11	License type: CC BY 4.0

R Discovery Prime

An Efficient Method for Mining Top-K Closed Sequential Patterns

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

An efficient model for information gain of sequential pattern from web logs based on dynamic weight constraint
Dhirendra Kumar Jha ... Archana Tomar
-
Dhirendra Kumar Jha, et. al.Dhirendra Kumar Jha ... Archana Tomar
01 Oct 2010
01 Oct 2010

A Novel Approach for Mining Closed Clickstream Patterns
Bao Huynh ... Bay Vo
Cybernetics and Systems | VOL. 52
Bao Huynh, et. al.Bao Huynh ... Bay Vo
11 Jan 2021
Cybernetics and Systems | VOL. 52

EFFICIENTLY MINING CLOSED SEQUENTIAL PATTERNS USING PREFIX TREE
Pham Thi Thiet ... Van Vo
Journal of Science and Technology - IUH | VOL. 28
Pham Thi Thiet, et. al.Pham Thi Thiet ... Van Vo
11 Nov 2020
Journal of Science and Technology - IUH | VOL. 28

Techniques for Understanding User Usage Behavior on the Internet
Abhijit R Joshi ... Aparna Ranade-Halbe
International Journal of Computer Applications | VOL. 92
Abhijit R Joshi, et. al.Abhijit R Joshi ... Aparna Ranade-Halbe
18 Apr 2014
International Journal of Computer Applications | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Efficient Method for Mining Top-K Closed Sequential Patterns

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access