Distributed big data analysis using spark parallel data processing

Hoger Khayrolla Omar,Alaa Khalil Jumaa

doi:10.11591/eei.v11i3.3187

Hoger Khayrolla Omar, Alaa Khalil Jumaa

Open Access

https://doi.org/10.11591/eei.v11i3.3187

Copy DOI

Abstract

Nowadays, the big data marketplace is rising rapidly. The big challenge is finding a system that can store and handle a huge size of data and then processing that huge data for mining the hidden knowledge. This paper proposed a comprehensive system that is used for improving big data analysis performance. It contains a fast big data processing engine using Apache Spark and a big data storage environment using Apache Hadoop. The system tests about 11 Gigabytes of text data which are collected from multiple sources for sentiment analysis. Three different machine learning (ML) algorithms are used in this system which is already supported by the Spark ML package. The system programs were written in Java and Scala programming languages and the constructed model consists of the classification algorithms as well as the pre-processing steps in a figure of ML pipeline. The proposed system was implemented in both central and distributed data processing. Moreover, some datasets manipulation manners have been applied in the system tests to check which manner provides the best accuracy and time performance. The results showed that the system works efficiently for treating big data, it gains excellent accuracy with fast execution time especially in the distributed data nodes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bulletin of Electrical Engineering and Informatics	Publication Date: Jun 1, 2022
Citations: 5	License type: CC BY-SA 4.0

R Discovery Prime

R Discovery Prime

Distributed big data analysis using spark parallel data processing

Abstract

Talk to us

Similar Papers

More From: Bulletin of Electrical Engineering and Informatics

Lead the way for us

Similar Papers

Big Data and Java are integrated with machine learning
Anis Ahmed Qazi ... Ehsan Abbas
International Journal of Multidisciplinary Sciences and Arts | VOL. 3
Anis Ahmed Qazi, et. al.Anis Ahmed Qazi ... Ehsan Abbas
29 Mar 2024
International Journal of Multidisciplinary Sciences and Arts | VOL. 3

Legal Governance of Brain Data Derived from Artificial Intelligence
Mahika Ahluwalia
Voices in Bioethics | VOL. 7
Mahika AhluwaliaMahika Ahluwalia
02 Jun 2021
Voices in Bioethics | VOL. 7

An Efficient Procedure for Classification of Big Data using the Deep Learning Enabled Spark Architecture and Machine Learning
-
International Journal of Recent Technology and Engineering | VOL. 8
--
02 Nov 2019
International Journal of Recent Technology and Engineering | VOL. 8

Large-scale data mining analytics based on MapReduce

-

01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed big data analysis using spark parallel data processing

Abstract

Talk to us

Similar Papers

More From: Bulletin of Electrical Engineering and Informatics