Abstract

In this work, we conduct a detailed memory characterization of a representative set of modern data-management software (Cassandra, MongoDB, OrientDB and Redis) running an illustrative NoSQL benchmark suite (YCSB). These applications are widely popular NoSQL databases with different data models and features such as in-memory storage. We compare how these data-serving applications behave with respect to other well-known benchmarks, such as SPEC CPU2006, PARSEC and NAS Parallel Benchmark. The methodology employed for evaluation relies on state-of-the-art full-system simulation tools, such as gem5. This allows us to explore configurations unattainable using performance monitoring units in actual hardware, being able to characterize memory properties. The results obtained suggest that NoSQL application behavior is not dissimilar to conventional workloads. Therefore, some of the optimizations present in state-of-the-art hardware might have a direct benefit. Nevertheless, there are some common aspects that are distinctive of conventional benchmarks that might be sufficiently relevant to be considered in architectural design. Strikingly, we also found that most database engines, independently of aspects such as workload or database size, exhibit highly uniform behavior. Finally, we show that different data-base engines make highly distinctive demands on the memory hierarchy, some being more stringent than others.

Highlights

  • CONCERNING Information Technologies, one of the broad fields with a large social and economic impact is Big-data Analytics

  • Despite the relevant findings of all these characterization works, the methodology employed has a significant limitation, which is the fixed nature of the microarchitecture under study. This limitation leaves many questions unanswered such as: what is the appropriate size for instruction caching? and for data? Are these applications responsive to performance mechanisms such as replacement policy or prefetching? What is the sharing degree in these multi-threaded applications? In this paper, we conduct the experiments required to provide answers to these questions making use of an alternative methodology: using a full system simulation tool capable of (1) allowing the large software stack to be executed without changes and, (2) being fast enough in performing the complex warmup of the applications in a feasible amount of time

  • We have conducted a simulation-driven characterization of four modern NoSQL databases: Cassandra, 1045-9219 (c) 2017 IEEE

Read more

Summary

INTRODUCTION

Centralized storage and processing, able to manage data analytics through relational models and SQL querying, are gradually being replaced by alternatives more focused on the software side to cope with the need for scalability under constrained cost These approaches are based on fully distributed storage and processing frameworks running with commodity hardware [1]. The adoption of mechanisms which are able to meet data processing speed demands becomes essential, with solutions such as in-memory storage [7] or high-performance processing frameworks [8] In contrast to this software-centric paradigm shift, the underlying hardware has, in general, little to no specialization. Numerous previous works [9][10][11][12] have made a remarkable effort in the characterization of Big-data applications These works make use of benchmark suites covering broad application scenarios, in most cases running on current hardware. We describe some related work and summarize our main conclusions in sections VI and VII respectively

YCSB FRAMEWORK
NoSQL AND CONVENTIONAL APPLICATIONS
METHODOLOGY
Workload Generation
Validation
APPLICATION CHARACTERIZATION
Instruction Profile
Data Working set
Data Working-set sensitivity to Database Size
Data Working-set Sensitivity to Record Distribution
Instruction Working set
MULTI-LEVEL HIERARCHY PERFORMANCE
MPKI across the Memory Hierarchy
Replacement Policy
Hardware Prefetching
RELATED WORK
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.