Memory-intensive Workloads Research Articles

The rapidly increasing data in recent years requires the datacenter infrastructure to store and process data with extremely high throughput and low latency. Fortunately, persistent memory (PM) and RDMA technologies bring new opportunities towards this goal. Both of them are capable of delivering more than 10 GB/s of bandwidth and sub-microsecond latency. However, our past experiences and recent studies show that it is non-trivial to build an efficient and distributed storage system with such new hardware. In this article, we design and implement TH-DPMS (<underline>T</underline>sing<underline>H</underline>ua <underline>D</underline>istributed <underline>P</underline>ersistent <underline>M</underline>emory <underline>S</underline>ystem) based on persistent memory and RDMA, which unifies the memory, file system, and key-value interface in a single system. TH-DPMS is designed based on a unified distributed persistent memory abstract, pDSM. pDSM acts as a generic layer to connect the PMs of different storage nodes via high-speed RDMA network and organizes them into a global shared address space. It provides the fundamental functionalities, including global address management, space management, fault tolerance, and crash consistency guarantees. Applications are enabled to access pDSM with a group of flexible and easy-to-use APIs by using either raw read/write interfaces or the transactional ones with ACID guarantees. Based on pDSM, we implement a distributed file system and a key-value store named pDFS and pDKVS, respectively. Together, they uphold TH-DPMS with high-performance, low-latency, and fault-tolerant data storage. We evaluate TH-DPMS with both micro-benchmarks and real-world memory-intensive workloads. Experimental results show that TH-DPMS is capable of delivering an aggregated bandwidth of 120 GB/s with 6 nodes. When processing memory-intensive workloads such as YCSB and Graph500, TH-DPMS improves the performance by one order of magnitude compared to existing systems and keeps consistent high efficiency when the workload size grows to multiple terabytes.

Since Non-Volatile Memories (NVMs) started entering the mainstream memory/storage market, we must consider how to secure NVM-equipped computing systems. Recent Meltdown and Spectre attacks are a strong evidence that security must be intrinsic to computing systems instead of being added as an afterthought. Processor vendors are taking the first steps and are beginning to build security primitives into commodity processors. One security primitive that is associated with the use of emerging NVMs is memory encryption. Memory encryption, while necessary, is very challenging when used with NVMs because it exacerbates the write endurance problem. Secure architectures use cryptographic metadata that must be persisted and restored to allow secure recovery of data in the event of power-loss. Specifically, encryption counters must be persistent to enable secure and functional recovery of an interrupted system. However, the cost of ensuring and maintaining persistence for these counters can be significant. In this paper, we propose a novel scheme to maintain encryption counters without the need for frequent updates. Our new memory controller design, Osiris , repurposes memory Error-Correction Codes (ECCs) to enable fast restoration and recovery of encryption counters. Since different counter-mode encryption schemes are used in industry and research, we provide a versatile Osiris implementation that improves the performance and write-endurance in different memory encryption schemes. To evaluate our design, we use Gem5 to run eight memory-intensive workloads selected from SPEC2006 and U.S. Department of Energy (DoE) proxy applications, and three computation-intensive graph algorithms from CRONO. Compared to a write-through counter-cache scheme, on average, Osiris can reduce 45.8 percent of the memory writes (increase lifetime by 1.86x), and reduce the performance overhead from 44.7 percent(for write-through) to only 4.49 percent. Furthermore, without the need for backup battery or extra power-supply hold-up time, Osiris performs better than a battery-backed write-back (4.4 versus 5.7 percent overhead) and has less write-traffic (1.8 versus 5.4 percent overhead).

Memory-intensive Workloads Research Articles

Related Topics

Articles published on Memory-intensive Workloads

A Hybrid Neuromorphic Object Tracking and Classification Framework for Real-Time Systems.

Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model

Polling-Based Memory Interface

FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers

Fine-Grained CPU Power Management Based on Digital Frequency Divider

OSM: Off-Chip Shared Memory for GPUs

FlexChain

Co-Design and System for the Supercomputer “Fugaku”

Pinning Page Structure Entries to Last-Level Cache for Fast Address Translation

Power and Performance Evaluation of Memory-Intensive Applications

Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications

TH-DPMS

A novel energy-efficient scheduling model for multi-core systems

Hierarchical Orchestration of Disaggregated Memory

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

Self-learnable Cluster-based Prefetching Method for DRAM-Flash Hybrid Main Memory Architecture

Disaggregated Cloud Memory with Elastic Block Management

Towards Low-Cost Mechanisms to Enable Restoration of Encrypted Non-Volatile Memories

Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory-intensive Workloads Research Articles

Related Topics

Articles published on Memory-intensive Workloads

A Hybrid Neuromorphic Object Tracking and Classification Framework for Real-Time Systems.

Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model

Polling-Based Memory Interface

FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers

Fine-Grained CPU Power Management Based on Digital Frequency Divider

OSM: Off-Chip Shared Memory for GPUs

FlexChain

Co-Design and System for the Supercomputer “Fugaku”

Pinning Page Structure Entries to Last-Level Cache for Fast Address Translation

Power and Performance Evaluation of Memory-Intensive Applications

Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications

TH-DPMS

A novel energy-efficient scheduling model for multi-core systems

Hierarchical Orchestration of Disaggregated Memory

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

Self-learnable Cluster-based Prefetching Method for DRAM-Flash Hybrid Main Memory Architecture

Disaggregated Cloud Memory with Elastic Block Management

Towards Low-Cost Mechanisms to Enable Restoration of Encrypted Non-Volatile Memories

Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system