Abstract

Frequent itemset mining is one of the popular techniques used to discover hidden knowledge from large-scale transactional datasets in a wide range of applications. Apriori algorithm is considered as a typical algorithm to find frequent itemsets in market basket analysis. Since its inception, many efforts have been made to enhance the efficiency of the original algorithm. MapReduce model is one of the efficient tools to implement parallel and distributed computing, so that large-scale data set algorithms such as Apriori algorithm can be made efficient in terms of speed up and other related parameters. One of the major drawbacks of the MapReduce model is that it is not suitable for iterative jobs/tasks due to overheads imposed. Now-a-days, Apache Spark is getting huge attention for iterative jobs because of its in-memory processing capabilities. Most of the frequent pattern mining algorithms consider only distinct items in a transaction. For transactional data analysis, multiple occurrences of an item or in other words “quantities” by which a particular item is purchased in the same transaction can be important to derive additional information about frequent itemsets. In this script, we propose a modified version of the Apriori algorithm based on Apache Spark framework that not only mines the frequent itemsets in the input transactional data but also analyzes related quantities of the items for a particular itemset to find the most frequent quantity being purchased for every frequent itemset. Experiments are conducted to gain insight in the form of effectiveness, efficiency, and scalability of the proposed approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.