Lightweight Memory Research Articles

Hardware performance monitoring units (PMUs) are a standard feature in modern microprocessors, providing a rich set of microarchitectural event samplers. Recently, numerous profile-guided optimization (PGO) frameworks have exploited them to feature much lower profiling overhead compared to conventional instrumentation-based frameworks. However, existing PGO frameworks mainly focus on optimizing the layout of binaries; they overlook rich information provided by the PMU about data access behaviors over the memory hierarchy. Thus, we propose MaPHeA, a lightweight M emory hierarchy- a ware P rofile-guided He ap A llocation framework applicable to both HPC and embedded systems. MaPHeA guides and applies the optimized allocation of dynamically allocated heap objects with very low profiling overhead and without additional user intervention to improve application performance. To demonstrate the effectiveness of MaPHeA, we apply it to optimizing heap object allocation in an emerging DRAM-NVM heterogeneous memory system (HMS), selective huge-page utilization, and controlling the cacheability of the objects with the low temporal locality. In an HMS, by identifying and placing frequently accessed heap objects to the fast DRAM region, MaPHeA improves the performance of memory-intensive graph-processing and Redis workloads by 56.0% on average over the default configuration that uses DRAM as a hardware-managed cache of slow NVM. By identifying large heap objects that cause frequent TLB misses and allocating them to huge pages, MaPHeA increases the performance of the read and update operations of Redis by 10.6% over the transparent huge-page implementation of Linux. Also, by distinguishing the objects that cause cache pollution due to their low temporal locality and applying write-combining to them, MaPHeA improves the performance of STREAM and RADIX workloads by 20.0% on average over the system without cacheability control.

Read full abstract

Dynamic binary translation (DBT) is a core technologyto many important applications such as system virtualization, dynamic binary instrumentation, and security. However, there are several factors that often impede its performance: 1) emulation overhead before translation; 2) translation and optimization overhead; and 3) translated code quality. The issues also include its retargetabilitythat supports guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs-an important feature to system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, and use a multithreaded approach to implement DBT. By running the translator and the dynamic binary optimizer on different cores with different threads, it could off-load the overhead incurred by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and Low-Level Virtual Machine (LLVM) as our building blocks, we demonstrated in a multithreaded DBT prototype, called Hybrid-QEMU (HQEMU), that it could improve QEMU performance by a factor of 2.6x and 4.1x on the SPEC CPU2006 integer and floating point benchmarks, respectively, for dynamic translation of x86 code to run on x86-64 platforms. For ARM codes to x86-64 platforms, HQEMU can gain a factor of 2.5x speedup over QEMU for the SPEC CPU2006 integer benchmarks. We also address the performance scalability issue of multithreaded applications across ISAs. We identify two major impediments to performance scalability in QEMU: 1) coarse-grained locks used to protect shared data structures, and 2) inefficient emulation of atomic instructions across ISAs. We proposed two techniques to mitigate those problems: 1) using indirect branch translation caching (IBTC) to avoid frequent accesses to locks, and 2) using lightweight memory transactions to emulate atomic instructions across ISAs. Our experimental results show that for multithread applications, HQEMU achieves 25X speedups over QEMU for the PARSEC benchmarks.

Read full abstract

Lightweight Memory Research Articles

Related Topics

Articles published on Lightweight Memory

Spatiotemporal Focus and Lightweight Memory Network for Continuous Object Detection With Event Camera

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Room-temperature superelasticity in Mg–Sc shape memory alloys revealed by first-principles calculations

Embedding security into ferroelectric FET array via in situ memory operation

MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation

CPP: A lightweight memory page management extension to prevent code pointer leakage

Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU

A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores

Spintronic Computing-in-Memory Architecture Based on Voltage-Controlled Spin–Orbit Torque Devices for Binary Neural Networks

A Lightweight Memory Access Pattern Obfuscation Framework for NVM

Lightweight memory tracing for hot data identification

JavaScript AOT compilation

Lightweight and Seamless Memory Randomization for Mission-Critical Services in a Cloud Platform

Lightweight Memory Management for High Performance Applications in Consolidated Environments

Efficient and Retargetable Dynamic Binary Translation on Multicores

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

Towards Automated Memory Model Generation Via Event Tracing

COREMU

Transformation-based named entity extraction from spoken content for personal memory aid

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Lightweight Memory Research Articles

Related Topics

Articles published on Lightweight Memory

Spatiotemporal Focus and Lightweight Memory Network for Continuous Object Detection With Event Camera

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Room-temperature superelasticity in Mg–Sc shape memory alloys revealed by first-principles calculations

Embedding security into ferroelectric FET array via in situ memory operation

MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation

CPP: A lightweight memory page management extension to prevent code pointer leakage

Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU

A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores

Spintronic Computing-in-Memory Architecture Based on Voltage-Controlled Spin–Orbit Torque Devices for Binary Neural Networks

A Lightweight Memory Access Pattern Obfuscation Framework for NVM

Lightweight memory tracing for hot data identification

JavaScript AOT compilation

Lightweight and Seamless Memory Randomization for Mission-Critical Services in a Cloud Platform

Lightweight Memory Management for High Performance Applications in Consolidated Environments

Efficient and Retargetable Dynamic Binary Translation on Multicores

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

Towards Automated Memory Model Generation Via Event Tracing

COREMU

Transformation-based named entity extraction from spoken content for personal memory aid