Abstract

The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The sizes of bit vectors for itemsets are the main space expense of the algorithm that restricts its expansibility. Therefore, in this paper, a proposed algorithm that compresses the bit vectors of frequent itemsets will be presented. The new bit vector schema presented here depends on Boolean algebra rules to compute the intersection of two compressed bit vectors without making any costly decompression operation. The experimental results show that the proposed algorithm, Vertical Boolean Mining (VBM) algorithm is better than both Apriori algorithm and the classical vertical association rule mining algorithm in the mining time and the memory usage.

Highlights

  • Data mining is defined as “The non trivial extraction of implicit, previously unknown and potentially useful information from databases” [1]

  • Due to the huge amounts of the resulting frequent itemsets the method org.apache.commons.io.FileUtils.contentequals from package commons-io-2.4.jar downloaded from apache library2 is used to compare the results of the new algorithm with those of the Apriori algorithm and classical vertical association rule mining algorithm without compressed bitmap, to make sure that the results are correct

  • Experiments were conducted to compare between the Vertical Boolean Mining (VBM) total memory usage and the vertical association rules algorithm without compressed bitmap

Read more

Summary

INTRODUCTION

Data mining is defined as “The non trivial extraction of implicit, previously unknown and potentially useful information from databases” [1]. Non frequent items are detected by scanning the database once for each itemset to calculate its support This is the most important shortcoming of Apriori algorithm [9]. In order to overcome this issue, in this paper, a proposed algorithm that depends on a simple representation of frequent itemsets, which is, compressing the support sets bitmap of data itemsets that to be sent to memory, so as to save the space required by the algorithm. It contributes to reducing the execution time and the required memory.

BASIC CONCEPTION
VERTICAL ASSOCIATION RULES MINING
ACD 2 A1 B D E
BOOLEAN ALGEBRA
How to intersect two compressed bit vectors and calculate their support
How VBM algorithm works
EXPERIMENTAL RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call