Abstract

Today, increasing attention is being paid to Data Center (DC) traffic classification since these infrastructures have become the heart of a variety of time-sensitive and data-intensive service platforms. Classification provides the required tools for better understanding traffic patterns in order to ensure high Quality of Service (QoS) performances and solve scalability problems. Unfortunately, existing classification algorithms cannot deal efficiently with two critical challenges in DC traffic: inter-class imbalance and critical time constraints. In this paper, we propose a novel correlation-based algorithm following a cost-sensitive approach combined with a Bagged Random Forest (BRF) ensemble algorithm, to address the inter-class imbalance problem while meeting time requirements in a data center environment. In this strategy, a new method based on Reverse k-Nearest Neighbors (RkNN) is proposed to capture the rebalancing weights expressing inter-flow correlations, in order to perform an online classification approach. We demonstrate the efficiency of the algorithm by comparing its performance to several existing methods from data level, algorithm level, and cost-sensitive strategies on four real-world datasets. The results reveal that the proposed algorithm outperforms most approaches in the different datasets in terms of precision, recall, F1 measure, AUC and Kappa, as opposed to other algorithms that result in either high precision with low recall and low precision and high recall causing congestion or resource over provisioning.

Highlights

  • D ata centers (DC) are infrastructures holding large clusters of interconnected servers

  • We evaluate the efficiency of the proposed classification approach again several state-of-the-art algorithms using different real-world datasets and for several performance metrics: precision, recall, F1-measure, running time, Area Under the Curve (AUC) and Cohen Kappa’s metric

  • Among all the tested scenarios we present we choose for each data set, the one with the best classification performance in order to compare it with other imbalanced classification algorithms

Read more

Summary

Introduction

D ata centers (DC) are infrastructures holding large clusters of interconnected servers. Because of their rapid spread and the variety of services they can offer, from video on-demand, web searching, and gaming to storage and even computing, DCs have become the heart of the current digital world. Elephant flows can cause buffer and link congestion, delaying latency-sensitive mice flows and thereby resulting in network performance degradation. A deeper understanding of traffic patterns can significantly improve the performance of the control mechanism, which results in adaptive/dynamic network QoS and builds more efficient DCs in terms of resource consumption (energy, bandwidth, etc.) [4] and security (detect malicious traffic) [5]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call