Abstract

Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call