Abstract
Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms.
Highlights
Since Agrawal [1] proposed the problem of sequential pattern mining, it has become an important data mining problem, mainly because of its wide variety of applications
Sequential pattern mining methods have been used in applications such as mining web usage behaviour [2], [3], [4], Drug-drug interaction detection [5], text mining tasks such as document clustering [6], question answering [7], authorship atribution [8], touring path suggestion [9], CRM strategies for online shopping [10], mining anomalous events in surveillance videos from commercial environments [11], among others
In this work we addressed the problem of mining frequent sequences of symbols, which will be referred to as sequential pattern mining in the rest of this article
Summary
Since Agrawal [1] proposed the problem of sequential pattern mining, it has become an important data mining problem, mainly because of its wide variety of applications. Sequential pattern mining methods have been used in applications such as mining web usage behaviour [2], [3], [4], Drug-drug interaction detection [5], text mining tasks such as document clustering [6], question answering [7], authorship atribution [8], touring path suggestion [9], CRM strategies for online shopping [10], mining anomalous events in surveillance videos from commercial environments [11], among others. This is because most of the popular data mining methods were created when the common dataset size was several orders of magnitude smaller [12].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.