Abstract

Discovering concise representations of sequential patterns in sequential data is a well-established data mining task. Recently, Nica et al. have put forward an original approach RCA-Seq for directly extracting a hierarchy of multilevel closed partially-ordered patterns (MCPO-patterns) from a sequence database within the Relational Concept Analysis (RCA) framework. RCA-Seq has been applied successfully to small (∼1,000 sequences) but interesting real hydro-ecological datasets. However, RCA-Seq only focuses on providing comprehensible results to the detriment of performance. To improve the performance of RCA-Seq, we propose a new approach FastRCA-Seq that stems from RCA-Seq, and whose contributions are beneficial for two fields: Formal Concept Analysis, namely the RCA extension, and sequential pattern mining. FastRCA-Seq spans two key steps: the exploration of sequential data based on RCA, and the extraction of MCPO-patterns by navigating the RCA result. Firstly, our approach introduces an effective RCA implementation based on bit-array representations, bitwise operations, parallel computing, and several new properties of RCA that may prevent expensive computations. In addition, we state the bottleneck of RCA. Secondly, FastRCA-Seq is a self-contained approach for directly and efficiently mining hierarchies of MCPO-patterns from sequential data. We assess FastRCA-Seq on various benchmark datasets, precisely Gazelle, Kosarak, and FIFA. The results show that FastRCA-Seq outperforms RCA-Seq in terms of execution time (in average ∼169 times faster) and memory usage (in average with ∼42% less) while preserving the benefits of interpretability and usability of results by stakeholders.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call