Motivated by the explosion of Big Data analytics, performance improvements in low-power (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a time–energy performance analysis of MapReduce on heterogeneous systems with GPUs. We evaluate the time and energy performance of three MapReduce applications with diverse resource demands on a Hadoop–CUDA framework. As executing these applications on heterogeneous systems with GPUs is challenging, we introduce a novel lazy processing technique which requires no modifications to the underlying Hadoop framework. To analyze the impact of heterogeneity, we compare the heterogeneous CPU+GPU with the homogeneous CPU-only execution across three systems with diverse characteristics, (i) a traditional high-performance (brawny) Intel i7 system hosting a discrete 640-core Nvidia GPU of the latest Maxwell generation, (ii) a wimpy platform consisting of a quad-core ARM Cortex-A9 hosting the same discrete Maxwell GPU, and (iii) a wimpy platform integrating four ARM Cortex-A15 cores and 192 Nvidia Kepler GPU cores on the same chip. These systems encompass both intra-node heterogeneity with discrete GPUs and intra-chip heterogeneity with integrated GPUs. Our measurement-based performance analysis highlights the following results. For compute-intensive workloads, the brawny heterogeneous system achieves speedups of up to 2.3 and reduces the energy usage by almost half compared to the brawny homogeneous system. As expected, for applications where data transfers dominate the execution time, heterogeneity exhibits worse time–energy performance compared to homogeneous systems. For such applications, the heterogeneous wimpy A9 system with discrete GPU uses around 14 times the energy of homogeneous A9 system due to both system resource imbalances and high power overhead of the discrete GPU. However, comparing among heterogeneous systems, the wimpy A15 with integrated GPU uses the lowest energy across all workloads. This allows us to establish an execution time equivalence ratio between a single brawny node and multiple wimpy nodes. Based on this equivalence ratio, the wimpy nodes exhibit energy savings of two-thirds while maintaining the same execution time. This result advocates the potential usage of heterogeneous wimpy systems with integrated GPUs for Big Data analytics.
Read full abstract