Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

Ran Tao,Wenli Zhou,Yuanyuan Qiao,M Kavakli,M Pal,S.C.H Li,M.J.E Salami,M.A.B.M Basri,A Amini,A.B Masli

doi:10.1051/matecconf/20165605015

Ran Tao, Wenli Zhou + Show 8 more

Open Access

https://doi.org/10.1051/matecconf/20165605015

Copy DOI

Abstract

As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.

Highlights

With the rapid development of cloud computing, Hadoop[1] as an advanced big data processing tool, has become the first choice for many researchers and companies
Principal Component Analysis (PCA) uses dimensionality reduction technique to make a set of possibly correlated original indicators into relatively fewer comprehensive and linearly uncorrelated indicators by linear combination, and retain most of the information of original target[8]
We propose a score criterion by theses three execution times, we select a typical practical application, whose execution time is a reflection of the practical performance of HBLSNTAC

Summary

Introduction

With the rapid development of cloud computing, Hadoop[1] as an advanced big data processing tool, has become the first choice for many researchers and companies. In order to analyze the massive traffic data efficiently, we developed a Hadoop-based Large-scale Network Traffic Analysis Cluster (HBLSNTAC). It consists of one master (running NameNode, ResourceManager), one backup master ( work as client access server, running SecondaryNameNode) and nine slaves (running DataNode, NodeManager, ApplicationMaster). Users run various off-line statistical analysis applications using the massive data on the cluster. These applications are aimed at analyzing the basic statistical characteristic of network traffic, such as the time distribution or geographical distribution of network behavior of mobile phone users. We draw the conclusion and give the outlook of future work

The Design of Performance Evaluation System

Related Work

Performance Benchmark

Statistical Resource Data

Modeling Analysis

Principal Component Analysis

Multiple Linear Regression

Performance Evaluation

Experiment Setup

Modeling and Prediction

Prediction Verification

Multiple Practical Applications

One Single Practical Application

Summary

Conclusion and Future Work

Findings

Microsoft IT SES Enterprise Data Architect Team

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2016
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

A Constructive Heuristic for Automated Parallel Tests Assembly
Miroslava M Ignjatović ... Igor I Tartalja
International Journal of Software Engineering and Knowledge Engineering | VOL. 32
Miroslava M Ignjatović, et. al.Miroslava M Ignjatović ... Igor I Tartalja
01 Mar 2022
International Journal of Software Engineering and Knowledge Engineering | VOL. 32

Prediction Model of an HPC Application on CPU-GPU Cluster using Machine Learning Techniques
B N Chandrashekhar ... H.A Sanjay
-
B N Chandrashekhar, et. al.B N Chandrashekhar ... H.A Sanjay
01 Mar 2020
01 Mar 2020

Predicting the Execution Time of Workflow Activities Based on Their Input Features
Tudor Miu ... Paolo Missier
-
Tudor Miu, et. al.Tudor Miu ... Paolo Missier
01 Nov 2012
01 Nov 2012

Analysis of the execution time unpredictability caused by dynamic branch prediction
J Engblom
-
J EngblomJ Engblom
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences