Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

Joong-Yeon Cho,Hyun-Wook Jin,Karsten Schwan,Min Lee

doi:10.1016/j.parco.2014.07.005

Abstract

We analyze the impact of core affinity on both network and disk I/O performance.Both parallelism and locality are important for tasks that access disk and network.We suggest a novel approach to dynamically decide the core affinity of HDFS threads.Our dynamic core affinity improves the file upload throughput more than 30%. The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to Hadoop Distributed File System (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability.

Full Text