Memory System Behavior Research Articles

As multicore and many-core architectures evolve, their memory systems are becoming increasingly more complex. To bridge the latency and bandwidth gap between the processor and memory, they often use a mix of multilevel private/shared caches that are either blocking or nonblocking and are connected by high-speed network-on-chip. Moreover, they also incorporate hardware and software prefetching and simultaneous multithreading (SMT) to hide memory latency. On such multi- and many-core systems, to incorporate various memory optimization schemes using compiler optimizations and performance tuning techniques, it is crucial to have microarchitectural details of the target memory system. Unfortunately, such details are often unavailable from vendors, especially for newly released processors. In this article, we propose a novel microbenchmarking methodology based on short elapsed-time events (SETEs) to obtain comprehensive memory microarchitectural details in multi- and many-core processors. This approach requires detailed analysis of potential interfering factors that could affect the intended behavior of such memory systems. We lay out effective guidelines to control and mitigate those interfering factors. Taking the impact of SMT into consideration, our proposed methodology not only can measure traditional cache/memory latency and off-chip bandwidth but also can uncover the details of software and hardware prefetching units not attempted in previous studies. Using the newly released Intel Xeon Phi many-core processor (with in-order cores) as an example, we show how we can use a set of microbenchmarks to determine various microarchitectural features of its memory system (many are undocumented from vendors). To demonstrate the portability and validate the correctness of such a methodology, we use the well-documented Intel Sandy Bridge multicore processor (with out-of-order cores) as another example, where most data are available and can be validated. Moreover, to illustrate the usefulness of the measured data, we do a multistage coordinated data prefetching case study on both Xeon Phi and Sandy Bridge and show that by using the measured data, we can achieve 1.3X and 1.08X performance speedup, respectively, compared to the state-of-the-art Intel ICC compiler. We believe that these measurements also provide useful insights into memory optimization, analysis, and modeling of such multicore and many-core architectures.

Memory systems are dominant energy consumers, and thus many energy reduction techniques for memory buses and devices have been proposed. For practical energy reduction practices, we have to take into account the interaction between a processor and cache memories together with application programs. Furthermore, energy characterization of memory systems must be accurate enough to justify various techniques. In this article, we build an in-house energy simulator for memory systems that is accelerated by special hardware support while maintaining accuracy. We explore energy behavior of memory systems for various values of the processor and memory clock frequencies and cache configuration. Each experiment is performed with 24M instruction steps of real application programs to guarantee accuracy.The simulator is based on precise energy characterization of memory systems including buses, bus drivers, and memory devices by a cycle-accurate energy measurement technique. We characterize energy consumption of each component by an energy state machine whose states and transitions are associated with the dynamic and static energy costs, respectively. Our approach easily characterizes the energy consumption of complex SDRAMs. We divide and quantify energy components of main memory systems for high-level reduction. The energy simulator enables us to devise practical energy reduction schemes by providing the actual amount of reduction out of the total energy consumption in main memory systems. We introduce several practical energy reduction techniques for SDRAM memory systems and demonstrate energy reduction ratio over the SDRAM memory systems with commercial SDRAM controller chipsets. We classify the SDRAM memory systems into high-performance and mid-performance classes and achieve suitable system configurations for each class. For instance, a typical high-performance 32-bit, 64 MB SDRAM memory system consumes 19.6 mJ, 33.8 mJ, 35.4 mJ, and 37.0 mJ for 24M instructions of an MP3 decoder, a JPEG compressor, a JPEG decompressor, and an MPEG4 decoder, respectively. Our reduction scheme saves 12.7 mJ, 15.1 mJ, 15.5 mJ, and 14.8 mJ, and the reduction ratios are 64.8%, 44.6%, 43.8%, and 40.1%, respectively, without compromising execution speed.

Memory System Behavior Research Articles

Articles published on Memory System Behavior

A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006

Analysis of cache behaviour and software optimizations for faster on-chip network simulations

An expert system for checking the correctness of memory systems using simulation and metamorphic testing

Using temporal logics for specifying weak memory consistency models

Using temporal logics for specifying weak memory consistency models

Influence Mechanism of External Social Capital of University Teachers on Evolution of Generative Digital Learning Resources of Educational Technology of University Teachers - Empirical Analysis of Differential Evolution Algorithm and Structural Equation Model of Bootstrap Self-extraction Technique

Long-Range Dependencies and Statistical Self-Similarity in Computer Memory System

Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking

Analyzing Parallel Programs with Pin

Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures

Analyzing the Effects of Hyperthreading on the Performance of Data Management Systems

The impact of wrong-path memory references in cache-coherent multiprocessor systems

Specifying memory consistency of write buffer multiprocessors

Introduction to the special issue

Low-energy off-chip SDRAM memory systems for embedded applications

Recursive array layouts and fast matrix multiplication

Memory system behavior of Java programs

On learning the input-output behaviour of nonlinear fading memory systems from finite data

Memory system characterization of commercial workloads

Prefetching and memory system behavior of the SPEC95 benchmark suite

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory System Behavior Research Articles

Articles published on Memory System Behavior

A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006

Analysis of cache behaviour and software optimizations for faster on-chip network simulations

An expert system for checking the correctness of memory systems using simulation and metamorphic testing

Using temporal logics for specifying weak memory consistency models

Using temporal logics for specifying weak memory consistency models

Influence Mechanism of External Social Capital of University Teachers on Evolution of Generative Digital Learning Resources of Educational Technology of University Teachers - Empirical Analysis of Differential Evolution Algorithm and Structural Equation Model of Bootstrap Self-extraction Technique

Long-Range Dependencies and Statistical Self-Similarity in Computer Memory System

Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking

Analyzing Parallel Programs with Pin

Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures

Analyzing the Effects of Hyperthreading on the Performance of Data Management Systems

The impact of wrong-path memory references in cache-coherent multiprocessor systems

Specifying memory consistency of write buffer multiprocessors

Introduction to the special issue

Low-energy off-chip SDRAM memory systems for embedded applications

Recursive array layouts and fast matrix multiplication

Memory system behavior of Java programs

On learning the input-output behaviour of nonlinear fading memory systems from finite data

Memory system characterization of commercial workloads

Prefetching and memory system behavior of the SPEC95 benchmark suite