Abstract

in this paper, we discuss and identify the main traffic challenges of Hadoop jobs in data centre network (DCN). High volume of traffic is generated during the shuffle phase when the output data of mapper nodes is transferred to reducer nodes. This traffic requires efficient network resources to accelerate the shuffling phase of Hadoop jobs. Equal cost multipath algorithm (ECMP) is used in DCN to perform the routing process of flows to achieve high bandwidth utilization. However, the scheduling process of this algorithm is non-dynamic and lacks a global view of the entire network. Consequently, an effective scheduling and routing algorithm based on Software defined networking (SDN) is proposed to obtain efficient bandwidth utilization for each shuffling flow in DCN. By comparison with ECMP and TRILL (Transparent Interconnection of Lots of Links), the experimental results showed that our proposed algorithm can increase the bandwidth utilization of leaf-spine topology in DCN and provides dynamic scheduling and routing. It also can speed up the execution time of shuffling phase, which leads to improve the performance of Hadoop jobs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call