Significance of Data Structures and Data Retrieval Techniques on Sequence Rule Mining Efficacy

Et Al Nayanjyoti Mazumdar

doi:10.17762/ijritcc.v11i9.8838

Abstract

Sequence mining intends to discover rules from diverse datasets by implementing Rule Mining Algorithms with efficient data structures and data retrieval techniques. Traditional algorithms struggle in handling variable support measures which may involve repeated reconstruction of the underlying data structures with changing thresholds. To address these issues the premiere Sequence Mining Algorithm, AprioriAll is implemented against an Educational and a Financial Dataset, using the HASH and the TRIE data structures with scan reduction techniques. Primary idea is to study the impact of data structures and retrieval techniques on the rule mining process in handling diverse datasets. Performance Evaluation Matrices- Support, Confidence and Lifts are considered for testing the efficacies of the algorithm in terms of memory requirements and execution time complexities. Results unveil the excellence of Hashing in tree construction time and memory overhead for fixed sets of pre-defined support thresholds. Whereas, TRIE may avoid reconstruction and is capable of handling dynamic support thresholds, leading to shorter rule discovery time but higher memory consumption. This study highlights the effectiveness of Hash and TRIE data structures considering the dataset characteristics during rule mining. It underscores the importance of appropriate data structures based on dataset features, scanning techniques, and user-defined parameters.

Full Text