On using MapReduce to scale algorithms for Big Data analytics: a case study

Phongphun Kijsanayothin,Rattikorn Hewett,Gantaphon Chalumporn

doi:10.1186/s40537-019-0269-1

Phongphun Kijsanayothin, Rattikorn Hewett + Show 1 more

Open Access

https://doi.org/10.1186/s40537-019-0269-1

Copy DOI

Journal: Journal of Big Data	Publication Date: Nov 30, 2019
Citations: 8	License type: open-access

Affiliation: Naresuan University, Texas Tech University

Abstract

IntroductionMany data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case descriptionThis paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluationFormal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.ConclusionsThe results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

Highlights

Many data analytics algorithms are originally designed for in-memory data
The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm
This paper presents a study of the applicability of MapReduce for scaling data analytic and machine learning algorithms to “Big algorithms” for Big Data

Summary

Discussion and evaluation

The proposed non-naive AprioriS algorithm has several advantages. First, it has a simple concept that is easy to understand and implement. Most importantly, based on both theoretical and empirical results, AprioriS is highly effective in performance while producing the same accuracy It requires one scan of database and a single phase of MapReduce. Some are not and the transitions of these algorithms to MapReduce paradigm have shown to be much more complex or ineffective [16] Examples of such algorithms include multiple iterative algorithms, some of which require a chained of data to be processed for convergence or to be updated after each iteration [25, 33]. This clearly adds overhead in communication and data movement To parallelize these algorithms, we do not necessarily follow the naive MapReduce-based implementation that mimics original sequential processes but to look for alternative solutions that effectively exploit parallelism

Conclusions

Introduction

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On using MapReduce to scale algorithms for Big Data analytics: a case study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Chapter 1 - Bio-Inspired Algorithms for Big Data Analytics: A Survey, Taxonomy, and Open Challenges
Sukhpal Singh Gill ... Rajkumar Buyya
Big Data Analytics for Intelligent Healthcare Management, Volume Three | VOL. -
Sukhpal Singh Gill, et. al.Sukhpal Singh Gill ... Rajkumar Buyya
01 Jan 2019
Big Data Analytics for Intelligent Healthcare Management, Volume Three | VOL. -

Machine learning algorithms for Big Data analytics including deep learning
Shaveta Malik ... Rohit Sahoo
-
Shaveta Malik, et. al.Shaveta Malik ... Rohit Sahoo
24 Aug 2022
24 Aug 2022

Health Data Analytics with an Opportunistic Big Data Algorithm
Gantaphon Chalumporn ... Rattikorn Hewett
-
Gantaphon Chalumporn, et. al.Gantaphon Chalumporn ... Rattikorn Hewett
01 Jul 2020
01 Jul 2020

A survey of machine learning algorithms for big data analytics
S Athmaja ... Vasantha Kavitha
-
S Athmaja, et. al.S Athmaja ... Vasantha Kavitha
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On using MapReduce to scale algorithms for Big Data analytics: a case study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data