Abstract

Sequential pattern mining is an important mining task for discovering sequential patterns along with their insight relationships in many real-world applications. In practice, sequence databases are kept changing over the time along with their business. For some reasons, some sequences in the database are asked to be deleted from the database. In order to have a synchronization of discovered sequential patterns with the database from which they have been discovered, the sequential pattern mining task is re-considered with many challenges. As the number of deleted sequences is often smaller than the size of the entire database, re-mining from scratch the updated database might incur a high cost because sequential pattern mining is a computationally expensive task. In this paper, our work aims at an efficient incremental mining solution to the sequential pattern mining task with sequence deletions. Different from the existing works, we propose an expanded prefix tree by extending the existing prefix tree with additional structures for capturing more necessary information for the incremental mining process. Based on this tree, we propose an incremental sequential pattern mining algorithm, SPMD, for finding a complete set of sequential patterns with no re-scanning the original database, when a number of sequences in the database are deleted. Experimental results on the benchmark databases have confirmed that our SPMD algorithm outperforms the re-mining from scratch with the PrefixSpan algorithm with less running time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.