On the core affinity and file upload performance of Hadoop

Joong-Yeon Cho,Hyun-Wook Jin,Karsten Schwan,Min Lee

doi:10.1145/2534645.2534651

Abstract

The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.

Full Text