A two steps method of resources utilization predication for large Hadoop data center

Lei Yu,Shangming Ning,Fei Teng,Yunshu Li,Shengdong Du,Zhe Cui

doi:10.1002/cpe.5634

Abstract

SummaryWith the increase of data processing and Hadoop data center construction requirements, the performance of Hadoop data center is limited by inappropriate resources utilization. This paper introduces a new method to predict utilization for large‐scale Hadoop clusters. The new method adopts a two steps model, which includes Hadoop applications' performance simulation and resources utilization prediction. For performance simulation, a new simulator, which integrates baseline test and multilayered network model, is introduced and implemented. A resources utilization predictor is proposed in the second step. By analyzing the pattern of resources utilization, a single task model is proposed. A parallel‐batch‐task‐based (PBT) model, which represents the behavior of real Hadoop applications by integrating the single task model, is introduced. Two test scenarios are configured to verify the performance of our method. For the data center scenario, Terasort, Wordcount, and Hive are selected as benchmarks. In the virtual machines scenario, Terasort is used as benchmark. The experiments show that the error comparing between the simulator results and experimental environment results in most cases is less than 10%. The results confirm that we can locate the resource bottleneck for Hadoop clusters, meanwhile we can agilely configure clusters for applications with massive data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A two steps method of resources utilization predication for large Hadoop data center

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Jan 8, 2020
Citations: 3

Similar Papers

Large Scale Data Centers Simulation Based on Baseline Test Model
Fei Lei ... Lei Yu
-
Fei Lei, et. al.Fei Lei ... Lei Yu
01 May 2018
01 May 2018

RUPredHadoop: Resources Utilization Predictor for Hadoop with Large-Scale Clusters
Shangming Ning ... Shengdong Du
-
Shangming Ning, et. al.Shangming Ning ... Shengdong Du
01 Jan 2018
01 Jan 2018

Data Centers: Jobs and Opportunities in Communities Nationwide
Nam Pham
SSRN Electronic Journal | VOL. -
Nam PhamNam Pham
01 Jan 2017
SSRN Electronic Journal | VOL. -

Virtual machine migration based load balancing for resource management and scalability in cloud environment
Nagamani H Shahapure ... P Jayarekha
International Journal of Information Technology | VOL. 12
Nagamani H Shahapure, et. al.Nagamani H Shahapure ... P Jayarekha
09 Aug 2018
International Journal of Information Technology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A two steps method of resources utilization predication for large Hadoop data center

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience