Abstract

The immense amount of data generated on a daily basis by various devices and systems necessitates a change in data analysis methods. As an important part of analytics, data mining methods require a paradigm shift to solve problems because the old methods cannot manage massive data. Association rule mining is a data mining algorithm used to solve various domain problems. Because of the immense volume of data, one-node solutions are no longer useful, and it is necessary to solve problems by using a distributed and shared-nothing architecture such as Map-Reduce. However, when association rule mining is transferred to these architectures, new problems appear. The main problems are lack of data locality and iteration support and process skewness. In this paper, a method is proposed that solves these problems. Kavosh converts data into a unified format that helps nodes perform their tasks independently without the need to exchange data with other nodes. In addition, the proposed method compresses input data to facilitate data management. Another advantage is the lack of process skewness because it is possible to allocate a predefined amount of data to each node. Kavosh omits iterations required for finding frequent itemsets by changing the Map-Reduce architecture. The proposed method is implemented using Hadoop, and the results are compared with open-source products in terms of three aspects: execution time, load balancing and data compression. The results show that Kavosh outperforms other methods in these aspects.

Highlights

  • With the growth of information, traditional analysis methods must be modified because they cannot handle immense amounts of data

  • Evaluation The proposed method is evaluated on the TPC Benchmark DS (TPC-DS) dataset and a real-world dataset

  • Kavosh is a Map-Reduce-based association rule mining method that can manage an immense amount of data

Read more

Summary

Introduction

With the growth of information, traditional analysis methods must be modified because they cannot handle immense amounts of data. Data mining algorithms are analytics that require a paradigm shift for algorithm execution and changes for deployment over nodes. Data mining algorithms must be modified so they can be executed over scalable and distributable environments. Modifying data mining algorithms for distributed architecture is not easy. One problem with distributed architecture is data locality. This occurs when the required data for processing do not exist on the processor node

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.