Abstract

An important data mining task is to discover all high utility sequences in a quantitative sequence database. Although useful, the number of discovered sequences is often very large. To find patterns that are more tailored to a user’s needs, this paper studies the problem of mining frequent high utility sequences satisfying item constraints. This article proposes a novel algorithm named C-FHUSM to quickly obtain these sequences from two concise representations discovered from a quantitative sequence database, namely frequent generator high utility sequences and frequent closed high utility sequences. The first set is extracted using a novel algorithm named FGenHUSM, while an existing algorithm is applied to extract the second set. C-FHUSM integrates novel pruning techniques to ignore sequences that do not satisfy item constraints early by checking only a small number of representative sequences at the beginning of the mining process. Experimental results show that C-FHUSM can be more than ten times faster and has better scalability than a modified version of the state-of-the-art EHUSM algorithm for mining sequences with item constraints. Moreover, it is found that using C-FHUSM is beneficial when a user frequently changes constraints as results can be updated without rescanning the database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call