One of the most well-known and widely implemented data mining methods is Apriori algorithm which is responsible for mining frequent item sets. The effectiveness of the Apriori algorithm has been improved by a number of algorithms that have been introduced on both parallel and distributed platforms in recent years. They are distinct from one another on account of the method of load balancing, memory system, method of data degradation, and data layout that was utilised in their implementation. The majority of the issues that arise with distributed frameworks are associated with the operating costs of handling distributed systems and the absence of high-level parallel programming languages. In addition, when using grid computing, there is constantly a possibility that a node will fail, which will result in the task being re-executed multiple times. The MapReduce approach that was developed by Google can be used to solve these kinds of issues. MapReduce is a programming model that is applied to large-scale distributed processing of data on large clusters of commodity computers. It is effective, scalable, and easy to use. MapReduce is also utilised in cloud computing. This research paper presents an enhanced version of the Apriori algorithm, which is referred to as Improved Parallel and Distributed Apriori (IPDA). It is based on the scalable environment referred as Hadoop MapReduce, which was used to analyse Big Data. Through the generation of split-frequent data regionally and the early elimination of unusual data, the proposed work has its primary objective to reduce the enormous demands placed on available resources as well as the reduction of the overhead communication that occurs whenever frequent data are retrieved. The paper presents the results of tests, which demonstrate that the IPDA performs better than traditional apriori and parallel and distributed apriori in terms of the amount of time required, the number of rules created, and the various minimum support values.
Read full abstract