Big data analytics on Apache Spark

Salman Salloum,Xiaojun Chen,Patrick Xiaogang Peng,Ruslan Dautov,Joshua Zhexue Huang

doi:10.1007/s41060-016-0027-9

Abstract

Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Big data analytics on Apache Spark

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics

Lead the way for us

Journal: International Journal of Data Science and Analytics	Publication Date: Oct 13, 2016
Citations: 275

Similar Papers

Network security and anomaly detection with Big-DAMA, a big data analytics framework
Pedro Casas ... Giuseppe Settanni
-
Pedro Casas, et. al.Pedro Casas ... Giuseppe Settanni
01 Sep 2017
01 Sep 2017

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network
Muhammad Ashfaq Khan ... Md Rezaul Karim
Symmetry | VOL. 10
Muhammad Ashfaq Khan, et. al.Muhammad Ashfaq Khan ... Md Rezaul Karim
11 Oct 2018
Symmetry | VOL. 10

A Theoretical Framework for Big Data Analytics Based on Computational Intelligent Algorithms with the Potential to Reduce Energy Consumption
Haruna Chiroma ... Usman Ali Abdullahi
-
Haruna Chiroma, et. al.Haruna Chiroma ... Usman Ali Abdullahi
01 Jan 2019
01 Jan 2019

An Approach for Optimizing the Performance for Apache Spark Applications
Preeti Gupta ... Arun Sharma
-
Preeti Gupta, et. al.Preeti Gupta ... Arun Sharma
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big data analytics on Apache Spark

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics