Abstract

Abstract Aiming at the problems of low efficiency, low cost of time and space, this paper proposes an incremental association rule mining algorithm based on Hadoop load balancing. In combination with the tree structure, when the data in the database is constantly updated, it does not need to scan the original database to generate frequent item sets, and use the load balancing in the data distribution so that the master node distributes the data to the child nodes evenly. In the experiment of control variable method, the variables of minimum support and sample increment are processed respectively. The experimental results show that when the minimum support is unchanged and the transaction data set is increased, the incremental association rule mining algorithm based on Hadoop load balancing takes less than 14.3% of the Apriori algorithm. The number of association rules mined by the algorithm is more than that of the Apriori algorithm. And the memory usage of the Hadoop-based incremental association rule mining algorithm is much smaller than the Apriori algorithm; when the total amount of transaction data is constant and the minimum support is changed, the memory usage of the Hadoop-based incremental association rule mining algorithm is smaller than the Apriori algorithm. The Hadoop-based incremental association rule mining algorithm has some improvements in memory usage and efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.