Abstract

Nonoverlapping sequential pattern mining (SPM) is a data analysis task, which aims at identifying repetitive sequential patterns with gap constraint in a set of discrete sequences. Nonoverlapping means that any character in the sequence can be rematched by characters at different positions in the pattern, while overlapping means that any character can be reused at the same position, which can be regarded as no condition. Compared with overlapping SPM, nonoverlapping SPM has more strict constraint on the occurrences of pattern, and meets the Apriori property. However, current algorithms mine frequent or closed patterns, resulting in some low-frequency but extremely important patterns being ignored. To tackle this issue, this paper proposes to mine high average utility nonoverlapping sequential patterns (HANP). An efficient algorithm called HANP-Miner is proposed, which involves two key steps: support calculation and candidate pattern reduction. To calculate the support (the occurrence frequency of a pattern), depth-first search and backtracking strategies based on the simplified Nettree structure are adopted, an approach that effectively reduces the time and space complexities of the algorithm. To effectively reduce the number of candidate patterns, a pattern join strategy based on an upper bound on the average utility is proposed. Experiments on biological sequences and real sales sequences are carried out to verify the efficiency of HANP-Miner and the superiority of HANPs. The results demonstrate that HANP-Miner is not only more efficient, but that the HANPs mined in this way are also more valuable than existing frequent patterns. The algorithms and datasets can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/HANP-Miner.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.