Analysis of the progressive sampling-based approach using real life datasets

Venkatapathy Umarani,Muthusamy Punithavalli

doi:10.2478/s13537-011-0016-y

Venkatapathy Umarani, Muthusamy Punithavalli

Open Access

https://doi.org/10.2478/s13537-011-0016-y

Copy DOI

Journal: Open Computer Science	Publication Date: Jan 1, 2011
Citations: 27	License type: cc-by-nc-nd

Affiliation: Anna University, Chennai

Abstract

AbstractThe discovery of association rules is an important and challenging data mining task. Most of the existing algorithms for finding association rules require multiple passes over the entire database, and I/O overhead incurred is extremely high for very large databases. An obvious approach to reduce the complexity of association rule mining is sampling. In recent times, several sampling-based approaches have been developed for speeding up the process of association rule mining. A proficient progressive sampling-based approach is presented for mining association rules from large databases. At first, frequent itemsets are mined from an initial sample and subsequently, the negative border is computed from the mined frequent itemsets. Based on the support computed for the midpoint itemset in the sorted negative border, the sample size is either increased or association rules are mined from it. In this paper, we have presented an extensive analysis of the progressive sampling-based approach with different real life datasets and, in addition, the performance of the approach is evaluated with the well-known association rule mining algorithm, Apriori. The experimental results show that accuracy and computation time of the progressive sampling-based approach is effectively improved in mining of association rules from the real life datasets.

Full Text