A Quantified Approach for large Dataset Compression in Association Mining

Pratush Jadoun

doi:10.9790/0661-1537984

Abstract

With the rapid development of computer and information technology in the last several decades, an enormous amount of data in science and engineering will continuously be generated in massive scale; data compression is needed to reduce the cost and storage space. Compression and discovering association rules by identifying relationships among sets of items in a transaction database is an important problem in Data Mining. Finding frequent itemsets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. However, existing compression algorithms are not appropriate in data mining for large data sets. In this research a new approach is describe in which the original dataset is sorted in lexicographical order and desired number of groups are formed to generate the quantification tables. These quantification tables are used to generate the compressed dataset, which is more efficient algorithm for mining complete frequent itemsets from compressed dataset. The experimental results show that the proposed algorithm performs better when comparing it with the mining merge algorithm with different supports and execution time.

Full Text