Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Scott Buffett

doi:10.3390/info11010044

Abstract

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.

Highlights

High utility sequential pattern mining (HUSPM) [1,2] is a subfield of frequent pattern mining [3]that assigns levels of relative magnitude or importance to objects with the goal of identifying more impactful patterns
Algorithm are tested, namely the maximum concatenation utility (MCU) method that maintains a list of items that are candidates for future concatenation, and the reduced concatenation utility (RCU) method that further reduces the upper bound on descendant pattern utilities by capitalizing on items having been removed from the candidate lists
Performance of these two approaches are compared to two state-of-the-art approaches from the literature, namely the sequence-weighted utility (SWU) method for determining upper bounds on candidate utilities as implemented by uSpan [2] and the reduced sequence utility (RSU) method implemented by HUS-Span [7]

Summary

Introduction

High utility sequential pattern mining (HUSPM) [1,2] is a subfield of frequent pattern mining [3]. Existing bound-based search pruning methods, namely the PEU and RSU approaches discussed above, are extended by a search technique that maintains a list of candidate concatenation items The use of this list has a significant impact on the search process since, for any sequential pattern sp under consideration in the search, only the items in the candidate list associated with sp need ever be considered for concatenation for any supersequence of sp with sp as prefix. A relaxed upper bound on the utility of all pattern extensions, referred to as the pivot-centered prefix extension utility (PPEU), is proposed While this value will always be greater than or equal to the PEU for a particular sequential pattern, seemingly rendering it less effective at pruning, it has the significant benefit that remaining utility values do not need to be maintained at all positions in the database.

Literature Review

Sequential Pattern Mining

High Utility Sequential Pattern Mining

Lexicographic Tree Search

Existing Pruning Strategies for HUSPM

PEU-Based Candidate Maintenance

The CRUSP Algorithm

Pivot-Centered PEU-Based Candidate Maintenance

The CRUSPPivot Algorithm

Objectives and Hypotheses

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Journal: Information	Publication Date: Jan 15, 2020
License type: CC BY 4.0

Similar Papers

A pure array structure and parallel strategy for high-utility sequential pattern mining
Bac Le ... Duy-Tai Dinh
Expert systems with applications | VOL. 104
Bac Le, et. al.Bac Le ... Duy-Tai Dinh
12 Mar 2018
Expert systems with applications | VOL. 104

Mining actionable combined high utility incremental and associated sequential patterns.
Min Shi ... Unil Yun
PloS one | VOL. 18
Min Shi, et. al.Min Shi ... Unil Yun
29 Mar 2023
PloS one | VOL. 18

Memory-adaptive high utility sequential pattern mining over data streams
Morteza Zihayat ... Aijun An
Machine Learning | VOL. 106
Morteza Zihayat, et. al.Morteza Zihayat ... Aijun An
02 Feb 2017
Machine Learning | VOL. 106

Candidate List Maintenance in High Utility Sequential Pattern Mining
Scott Buffett
-
Scott BuffettScott Buffett
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information