Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures.

Adrián Lamela,Óscar G Ossorio,Benjamín Sahelices,Guillermo Vinuesa,Anandakumar Haldorai

doi:10.1371/journal.pone.0257047

Abstract

Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem. In this work we present a novel off-chip prefetch technique based on a Hidden Markov Model that specifically deals with the latency problem caused by complexity of off-chip memory access patterns. Firstly, we present a thorough analysis of off-chip memory access patterns to identify its complexity in multicore processors. Based on this study, we propose a prefetching module located in the llc which uses two small tables, and where the computational complexity of which is linear with the number of computing threads. Our Markov-based technique is able to keep track and make clustering of several simultaneous groups of memory accesses coming from multiple simultaneous threads in a multicore processor. It can quickly identify complex address groups and trigger prefetch with very high accuracy. Our simulations show an improvement of up to 76% in the hit ratio of an off-chip dram cache for multicore architecture over the conventional prefetch technique (g/dc). Also, the overhead of prefetch requests (failed prefetches) is reduced by 48% in single core simulations and by 83% in multicore simulations.

Highlights

Non-Volatile Memory architectures have very high memory density with very low energy consumption, which allows for massive data sets in main RAM memory that can be directly accessed by the cores [1,2,3,4,5,6]
Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures astar, bzip2 and mcf, show a better hit ratio with our proposal HMM
HMM adapts well to this complexity and to the multiple simultaneous memory areas involved at the same time on each iteration. astar implements pathfinding algorithms which travel along graphs that map regions with neighborhood relationships, so data access pattern is complex and HMM adapts better than its competitors to it

Summary

Introduction

Non-Volatile Memory architectures have very high memory density with very low energy consumption, which allows for massive data sets in main RAM memory that can be directly accessed by the cores [1,2,3,4,5,6]. These three applications are representative of different types of misses, but all three have in common a deep reduction in the number of repetitive accesses to the same locations in offchip accesses and a clear increase in complexity of access patterns This behavior prevents DRAM Cache to have good hit ratio and latency. As it can be seen, spatial locality exists, but it is complex since it replicates an algorithmic behavior. For all benchmarks but lbm, OPKC reduces broadly with better LLC configurations and so the requests reaching DRAM Cache are reduced This leads to a strong need of developing complex prefetch techniques for DRAM Cache to deal with the complex off-chip access patterns

Related work

Evaluation

Evaluation setup

Conclusions