Abstract

Frequent itemset mining (FIM) is an important topic in data mining, which extracts knowledge of the relationships among items in a transaction dataset. Apriori algorithm and its variants, apriori-like algorithms, are widely used FIM algorithms. However, in a big data environment, these algorithms are inefficient. Due to the iterative calculation and modification of intermediate results, if an apriori-like algorithm is applied on a high-dimension or large-scale dataset, the memory requirement is unacceptable for a single machine. Although parallel and distributed programming could be a solution to deal with big data problems, apriori-like algorithms are not quite suitable for parallel computing because they need extra time overhead of communication to update intermediate results iteratively in cluster memories. To solve this problem, we propose a novel FIM algorithm, Distributed Apriori Based on Itemset-Encoding (DABIE). Different from existing methods, DABIE has two main advantages. Firstly, it stores intermediate results encoded in the form of 0 and 1 to reduce memory usage. Secondly, generating frequent itemsets is based on logical operation of encoding to reduce modification of data in cluster memories. These two advantages make DABIE more friendly to cluster computing. We apply DABIE on datasets with different scales. Compared with other distributed apriori-like algorithms, the results of our experiments show that DABIE can efficiently improve the multi-iterative FIM in big data environment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.