Peacock: a customizable MapReduce for multicore platform

Song Wu,Yaqiong Peng,Jun Zhang,Hai Jin

doi:10.1007/s11227-014-1238-2

Abstract

MapReduce has been demonstrated to be a promising alternative to simplify parallel programming with high performance on single multicore machine. Compared to the cluster version, MapReduce does not have bottlenecks in disk and network I/O on single multicore machine, and it is more sensitive to characteristics of workloads. A single execution flow may be inefficient for many classes of workloads. For example, the fixed execution flow of the MapReduce program structure can impose significant overheads for workloads that inherently have only one emitted value per key, which are mainly caused by the unnecessary reduce phase. In this paper, we refine the workload characterization from Phoenix++ according to the attributes of key-value pairs, and give a demonstration that the refined workload characterization model covers all classes of MapReduce workloads. Based on the model, we propose a new MapReduce system with workload-customizable execution flow. The system, namely Peacock, is implemented on top of Phoenix++. Experiments with four different classes of benchmarks on a 16-core Intel-based server show that Peacock achieves better performance than Phoenix++ for workloads that inherently have only one emitted value per key (up to a speedup of $$3.6\times $$ 3.6 × ) while identical for other classes of workloads.

Full Text