Abstract

Frequent itemsets extraction is very important in various data mining applications. It attempts to extract interesting patterns from given databases like association rules, correlations and clusters. It is difficult to calculate the frequent itemsets having a good speed from the available database. There are various algorithms to find out frequent itemsets like Apriori, FP growth algorithms, etc. Unfortunately, these algorithms fail in extracting interesting items, when it comes across excessive data. In the distributing environment, there is not only a need to automatically parallelize, but also to balance workloads well, which is also not possible with these algorithms. To defeat these disadvantages, there is a need to implement an algorithm supporting the missing elements, like automatic parallelization and workload balancing. This paper proposes a new algorithm for the extraction of frequent itemsets using Hadoop and MapReduce paradigms. The proposed algorithm is based on Modified Apriori algorithm, named as Frequent Itemset Mining using Modified Apriori (FIMMA). In this method, mappers will work independently and concurrently using the hashing technique for large databases; databases will distribute the number of mappers and the result will be given to the reducers. The reducers will give the final result showing the most frequent itemsets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.