Abstract

Big data applications in Hadoop usually cause heavy bandwidth demand and network bottleneck in the current data center network (DCN). On one hand, the design of DCN does not take the traffic demand and the traffic patterns of Hadoop applications into account. On the other hand, Hadoop suffers from inherent performance limitations due to its solution for transmitting massive data sets based on application-layer overlays, which ignores the architecture of DCN. To improve the performance of Hadoop applications, in this paper we propose the OEHadoop, a modified Hadoop which is built by co-designing Hadoop with hybrid optical and electrical data center network. For Hadoop, we redesign the pipeline-based replication process of MapReduce jobs to optical multicast. For the DCN architecture, we build a reconfigurable optical multicast system to adapt the DCN architecture to multicast traffic. A software-defined networking controller is implemented in data center to adjust the DCN architecture and exchange information with application layer. In order to accelerate the MapReduce jobs, a new algorithm to properly schedule the multicast requests is presented and deployed in the controller. We build a small-scale prototype of the OEHadoop to evaluate the control overhead, and to demonstrate the feasibility of our OEHadoop. In a simulation at a scale of real DCN, we present that our multicast requests scheduling algorithm outperforms related state-of-the-art solutions, and the MapReduce jobs in our OEHadoop speed up to about 2 times faster on average than native Hadoop.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call