Abstract

The latest CPUs(computer cpu processors) employ multiple cores, massively superscalar pipelines, out-of-order execution of tons of instructions, and advanced SIMD capabilities, which can hide the memory access latency. And most of recent memory-oriented data structures have already benefit from these features. However, due to the complexity of data organization, these CPUs do not always work well in memory resident database systems (MMDBs), particularly regarding storing data in dynamic random-access memory (DRAM). This article studies memory-efficient data structures by analyzing the run time, access latency, cache misses, instructions per cycle (IPC), and DRAM reads (bytes). Then, we design and implement two data organization schemas in the main memory database: dispersing data block organization and clustering data block organization. Using algorithmic engineering and careful attention to internal parallelism, cache alignment can hide the memory access latency. However, we find that these data structures work well in some cases, though they have been eclipsed in the face of complex access paths. To determine the reasons, we study the impact of database techniques on memory access latency, such as data partitioning, storage models, and by processing algorithms. With the specific main memory database system, we estimate the performance of each data organization schema based on DRAM DDR4 and the latest Intel Haswell microarchitecture. In conclusion, this work will make DRAM access applicable in real-world situations by implementing the schema to systems, such as in-memory databases.

Highlights

  • L ARGE -scale highly interactive online applications and batch processing offline applications require either low latency or a high throughput for processing huge transactional and analytical query workloads

  • While emerging byte-addressable nonvolatile memories (NVMs) enable persistent data, dynamic random access memory (DRAM) resident data allow for faster access to hot data [1]

  • Google BTree is an implementation of an ordered in-memory container based on a BTree data structure

Read more

Summary

INTRODUCTION

L ARGE -scale highly interactive online applications and batch processing offline applications require either low latency or a high throughput for processing huge transactional and analytical query workloads. We gather diverse memory-efficient data structures, design the access patterns, and observe their memory access performance on target database system resident data in DRAM. The hardware prefetching mechanisms that do not require a core pipeline to execute additional instructions to compute and issue prefetches can avoid instruction overheads, inflexibility and limited latency tolerance of software prefetching, as described in References [32]–[34] While these techniques work well for memorylatency bounds, they focus on long pointer chains and specific operations, such as hash table probs and tree traversals, which are only useful for a specific problem. This is because they are the latest main memory techniques, and trees and hash tables are most widely used in database systems for data organization We correctly implement these techniques in a real-life memory resident application, PELOTON.

BACKGROUND
PERFORMANCE COMPARISON ON PIBENCH
DATA SCALE
PARTITION
PROJECTIVITY
THREAD CONTENTION
DATA PREFETCHING
VIII. DISCUSSION AND CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call