Abstract

Mining long frequent sequences that contain a combinatorial number of frequent subsequences or using very low support thresholds to mine sequential patterns is usually both time-and memory-consuming. The mining of closed sequential patterns, sequential generator patterns, and maximum sequences has been proposed to overcome this problem. Sequential generator patterns, when used together with closed sequential patterns, can provide additional information that closed sequential patterns alone cannot provide. Mining sequential generator patterns is thus an important task in data mining as well. This paper proposes an algorithm called MSGP-PreTree for mining all sequential generator patterns based on the prefix-tree structure. The algorithm uses the characteristics of sequential generator patterns and sequence extensions to efficiently perform a depth-first search on a prefix tree. It also uses a vertical approach to list and count the supports of sequences based on the prime block encoding approach for representing candidate sequences and determining the frequencies of candidates. Besides, the search space of the MSGP-PreTree algorithm is much smaller than those of other algorithms because two pruning strategies are applied. Experimental results conducted on synthetic and real databases show that the proposed algorithm is effective than a previous one.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call