Abstract

One of the most important areas that are constantly being focused recently is the big data and mining frequent patterns from them is an interesting vertical which is perpetually being evolved and gained plethora of attention among the research fraternities. Generally, the data is mined with the aid of Apriori based algorithms, tree based algorithm and hash based algorithm but most of these existing algorithms suffer many snags and limitations. This paper proposes a new method that overrides and overcomes the most common problems related to speed, memory consumption and search space. The algorithm named Dual Mine employs binary vector representation and vertical data representations in the map reduce and then discover the most patterns from the large data sets. The Dual mine algorithm is then compared with some of the existing algorithms to determine the efficiency of the proposed algorithm and from the experimental results it is quite evident that the proposed algorithm “Dual Mine” outscored the other algorithms by a big magnitude with respect to speed and memory.

Highlights

  • The main purpose of the data mining is to unearth the previously unknown patterns hidden beneath the raw data [1]

  • The most common task that is hugely popular in the data mining vertical is frequent pattern mining where the most frequently occurring items are found

  • Frequent itemset mining has increased tremendous significance among the examination society generally since the business houses have become globalized. It is basic for the business houses to tap the accessible data assets conveniently to the full degree to advance their items all inclusive

Read more

Summary

Introduction

The main purpose of the data mining is to unearth the previously unknown patterns hidden beneath the raw data [1]. The most common task that is hugely popular in the data mining vertical is frequent pattern mining where the most frequently occurring items are found (market basket analysis, frequently purchased commodities by the consumers, frequently visited web pages in a website) The pioneer in this frequent itemset mining is carried out by Srikanthagarwal who proposed the Apriori algorithm [2]. Data mining has become an essential service that can decode and unearth the cloaked patterns and data present clueless in the raw data into human readable and understandable information for a wider usage. It has a wide scope of usage in the field of marketing, bioengineering, gene technologies, finance, and engineering. According to the authors David Hand, Mannila and Smyth [4] data mining is defined as, “The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner”

Big Data
Background of the Paper
Scope of the Paper
Challenges in the Paper
Motivation
Map Reduce
Proposed Approach
10. Procedure to Generate Bit Vector
11. Experimental Evaluation
12. Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call