Accelerating Unmodified Databases using Persistent Memory and Flash Storage Tiers

Amit Golander,Netanel Katzburg,Omer Zilberberg

doi:10.1145/3211890.3211902

Abstract

Recent breakthroughs in Storage Class Memory (SCM) technologies have driven Persistent Memory (PM) devices to become commodity off-the-shelf components in 2018. PM devices are byte addressable, plug into the memory interconnect, and run at near memory speeds, densities and price points. PM availability is led by Fast PM, comprised from backed-DRAM devices such as NVDIMM-N, and will follow soon with Slow PM, comprised of new SCM materials, such as Intel 3D XPoint NVDIMM. Fast and Slow PM devices vary in speeds, densities and cost, but both are orders of magnitude faster than Flash devices and an order of magnitude more expensive per GB.A PM-based file system was shown to accelerate unmodified transactional databases [2, 1], when the entire dataset was placed on NVDIMM-N cards. Most databases however are large and cannot fit entirely into the limited capacity provided by PM devices and even if they could - the high price per GB would prevent wide adoption.This work explores accelerating unmodifed databases using software that supports both NVDIMM-N and Flash devices and can transparently tier data between them. Ideally, this would provide the performance benefits of PM, while maintaining the cost structure of Flash solutions. We run a transactional workload (DBT-2) on an unmodified Postgresql [3] database, and compare the default block-based file system running on Flash NVMe to a file system, which is the first to support auto-tiering between byte-addressable NVDIMM devices and block-addressable Flash. The rest of the server and the operating system version are identical for both configurations (refer to Table 1).M1FS auto-tiering between PM pages and Flash blocks was implemented using the following architecture:• Each 4KB of data can reside on a PM page, a Flash block or both at the same time.• Data is speculatively copied to a Flash block ahead of needing to reuse the PM page, in order to hide the slower Flash access time.• Unless data is modified, an existing Flash copy is maintained in order to reduce Flash wearout.• PM pages are maintained in many queues in order to reduce the probability of lock contention when many cores are used concurrently• Page allocations are preferably done from PM attached to the CPU socket (NUMA-aware FS)

Full Text