Understanding system design for Big Data workloads

H P Hofstee,G C Chen,P W Y Wong,J Li,J Herring,J W Shi,K Hall,F H Gebara,D Jamsek,Y Li

doi:10.1147/jrd.2013.2242674

Abstract

This paper explores the design and optimization implications for systems targeted at Big Data workloads. We confirm that these workloads differ from workloads typically run on more traditional transactional and data-warehousing systems in fundamental ways, and, therefore, a system optimized for Big Data can be expected to differ from these other systems. Rather than only studying the performance of representative computational kernels, and focusing on central-processing-unit performance, this paper studies the system as a whole. We identify three major phases in a typical Big Data workload, and we propose that each of these phases should be represented in a Big Data systems benchmark. We implemented our ideas on two distinct IBM POWER7® processor-based systems that target different market sectors, and we analyze their performance on a sort benchmark. In particular, this paper includes an evaluation of POWER7 processor-based systems using MapReduce TeraSort, which is a workload that can be a stress test for multiple dimensions of system performance. We combine this work with a broader perspective on Big Data workloads and suggest a direction for a future benchmark definition effort. A number of methods to further improve system performance are proposed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Understanding system design for Big Data workloads

Abstract

Talk to us

Similar Papers

More From: IBM Journal of Research and Development

Lead the way for us

Journal: IBM Journal of Research and Development	Publication Date: May 1, 2013
Citations: 8

Similar Papers

Characterizing the efficiency of data deduplication for big data storage management
Ruijin Zhou ... Ming Liu
-
Ruijin Zhou, et. al.Ruijin Zhou ... Ming Liu
01 Sep 2013
01 Sep 2013

Characterizing and subsetting big data workloads
Zhen Jia ... Sally A Mckee
-
Zhen Jia, et. al.Zhen Jia ... Sally A Mckee
01 Oct 2014
01 Oct 2014

Big data and HPC collocation: Using HPC idle resources for Big Data analytics
Michael Mercier ... Olivier Richard
-
Michael Mercier, et. al.Michael Mercier ... Olivier Richard
13 Nov 2017
13 Nov 2017

Exploring Opportunities for Non-volatile Memories in Big Data Applications
Wei Wei ... Jin Xiong
-
Wei Wei, et. al.Wei Wei ... Jin Xiong
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Understanding system design for Big Data workloads

Abstract

Talk to us

Similar Papers

More From: IBM Journal of Research and Development