Abstract

Association rule mining algorithms are a frequently used data mining tecnique. It is aimed to find the items that are frequently found from the data. Nowadays, large data processing and analysis platforms are not focused on data mining, so they do not offer large-scale libraries for association rule mining algorithms. In the scope of this research, a library has been developed for association rule mining algorithms on a large data processing platform. The Apache Spark platform has been preferred in terms of common usage for the research case study. Implementation methods of different algorithms have been implemented on this platform to benefit from the Map-Reduce programming model. In this context, Apriori, Eclat and Pascal algorithms are implemented for large data platform. The library created by the implementation method we suggest is comparatively analyzed in terms of performance metrics on big data processing platforms with single and multiple nodes. The methods implemented within the scope of the research are also compared with the performance of the FpGrowth algorithm implemented by the Spark platform. The results of our research show that when tested on large scale data, the Apriori algorithm gives much better performance values than the other algorithms when switching from single-node cluster environment to multi-node cluster environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call