Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark

Sheikh Kamaruddin,Vadlamani Ravi

doi:10.1049/pbpc037f_ch7

Abstract

Many statistical and machine learning (ML) techniques have been successfully applied to small-sized datasets during the past one and half decades. However, in today's world, different application domains, viz., healthcare, finance, bioinformatics, telecommunications, and meteorology, generate huge volumes of data on a daily basis. All these massive datasets have to be analyzed for discovering hidden insights. With the advent of big data analytics (BDA) paradigm, the data mining (DM) techniques were modified and scaled out to adapt to the distributed and parallel environment. This chapter reviewed 249 articles appeared between 2009 and 2019, which implemented different DM techniques in a parallel, distributed manner in the Apache Hadoop MapReduce framework or Apache Spark environment for solving various DM tasks. We present some critical analyses of these papers and bring out some interesting insights. We have found that methods like Apriori, support vector machine (SVM), random forest (RF), K-means and many variants of the previous along with many other approaches are made into parallel distributed environment and produced scalable and effective insights out of it. This review is concluded with a discussion of some open areas of research with future directions, which can be explored further by the researchers and practitioners alike.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Review of Different Data Mining Techniques Used in Big Data Applications
Chandrakanta Mahanty ... Devpriya Panda
-
Chandrakanta Mahanty, et. al.Chandrakanta Mahanty ... Devpriya Panda
20 Dec 2021
20 Dec 2021

Analysis of Weka Data Mining Techniques for Heart Disease Prediction System
...
International Journal of Medical Reviews | VOL. 7
, et. al. ...
01 Jan 2020
International Journal of Medical Reviews | VOL. 7

Adversarial Data Mining
Murat Kantarcioglu ... Bowei Xi
-
Murat Kantarcioglu, et. al.Murat Kantarcioglu ... Bowei Xi
24 Oct 2016
24 Oct 2016

Sugarcane Yield Prediction Through Data Mining and Crop Simulation Models
Ralph G Hammer ... Paulo C Sentelhas
Sugar Tech | VOL. 22
Ralph G Hammer, et. al.Ralph G Hammer ... Paulo C Sentelhas
21 Oct 2019
Sugar Tech | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark

Abstract

Talk to us

Similar Papers