Abstract

For MapReduce jobs in data center, network traffic is generated in shuffling phase, causing east-west communication bottleneck. Aiming at this problem, an optimization scheme is proposed to aggregate relevant network traffic flows into local areas. Firstly, the characteristics of pre-scheduling of MapReduce jobs are extracted, the communication activity degree of jobs is defined and the computing jobs are divided into two types: active or inactive communication job. Then the Bayesian classification with active learning is used as the prediction model, and this model after training by sample data can determine job type. The active communication jobs are deployed in the same rack to improve network bandwidth utilization. The experiment results of small-scale data center show that the proposed communication optimization scheme has a significant effect on shuffling intensive jobs, reaching 4.2%-5.6%. In the case of larger amount of input data, this scheme has better robustness and can effectively reduce east-west communication delay in data center.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call