Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry

Satvik Vats,Bharat Bhushan Sagar,Karan Singh,Ali Ahmadian,Bruno A Pansera

doi:10.3390/sym12081274

Abstract

Traditional data analytics tools are designed to deal with the asymmetrical type of data i.e., structured, semi-structured, and unstructured. The diverse behavior of data produced by different sources requires the selection of suitable tools. The restriction of recourses to deal with a huge volume of data is a challenge for these tools, which affects the performances of the tool’s execution time. Therefore, in the present paper, we proposed a time optimization model, shares common HDFS (Hadoop Distributed File System) between three Name-node (Master Node), three Data-node, and one Client-node. These nodes work under the DeMilitarized zone (DMZ) to maintain symmetry. Machine learning jobs are explored from an independent platform to realize this model. In the first node (Name-node 1), Mahout is installed with all machine learning libraries through the maven repositories. The second node (Name-node 2), R connected to Hadoop, is running through the shiny-server. Splunk is configured in the third node (Name-node 3) and is used to analyze the logs. Experiments are performed between the proposed and legacy model to evaluate the response time, execution time, and throughput. K-means clustering, Navies Bayes, and recommender algorithms are run on three different data sets, i.e., movie rating, newsgroup, and Spam SMS data set, representing structured, semi-structured, and unstructured data, respectively. The selection of tools defines data independence, e.g., Newsgroup data set to run on Mahout as others cannot be compatible with this data. It is evident from the outcome of the data that the performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model. In addition, the proposed model can process any kind of algorithm on different sets of data, which resides in its native formats.

Highlights

The term Big Data reflects a volume of data that is huge and yet growing exponentially with time
The size of each data sets is 9 GB (9216 MB) and the description of the data sets are as follows: Data set 1: Twenty News group data is the set of information, which contains a survey on persons through the website, i.e., what kind of updates they read and what they like [47]
The paper with of theresponse performance the proposed modeltaken with respect the legacy model topresent measure the deals difference time,ofrunning time

Summary

Introduction

The term Big Data reflects a volume of data that is huge and yet growing exponentially with time. Symmetry 2020, 12, 1274 via Facebook) using devices such as computers, cell phones, etc.; apart from that, remote sensors are responsible for generating heterogeneous data at large scale. This kind of heterogeneous data may be in the structured form or unstructured form. Since the creation of PCs, a lot of information has been produced at a quick rate This situation is the key inspiration for present and imminent research boundaries. The world’s total amount of data has increased nine times according to the IT company Industrial

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Aug 2, 2020
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

An independent time optimized hybrid infrastructure for big data analytics
Satvik Vats ... B B Sagar
Modern Physics Letters B | VOL. 34
Satvik Vats, et. al.Satvik Vats ... B B Sagar
21 Jul 2020
Modern Physics Letters B | VOL. 34

Deister: A Light-Weight Autonomous Block Management in Data-Intensive File Systems Using Deterministic Declustering Distribution
Xuhong Zhang ... Jun Wang
-
Xuhong Zhang, et. al.Xuhong Zhang ... Jun Wang
01 Dec 2015
01 Dec 2015

Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution
Jun Wang ... Dan Huang
Journal of Parallel and Distributed Computing | VOL. 108
Jun Wang, et. al.Jun Wang ... Dan Huang
25 Mar 2016
Journal of Parallel and Distributed Computing | VOL. 108

Online Scheduling of Machine Learning Jobs in Edge-Cloud Networks
Jingping She ... Ruiting Zhou
-
Jingping She, et. al.Jingping She ... Ruiting Zhou
01 Oct 2021
01 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry