Data Cache Research Articles

Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction approaches require high hardware cost, which is the main obstacle for its wide adoption in current processors. To tackle this issue, we revisit load value prediction as an efficient alternative to the classical approaches that predict all instructions. By speculating only on loads, the pressure over shared resources (e.g., the Physical Register File) and the predictor size can be substantially reduced (e.g., more than 90% reduction compared to recent works). We observe that existing value predictors cannot achieve very high performance when speculating only on load instructions. To solve this problem, we propose a new, accurate and low-cost mechanism for predicting the values of load instructions: the Address-first Value-next Predictor with Value Prefetching (AVPP). The key idea of our predictor is to predict the load address first (which, we find, is much more predictable than the value) and to use a small non-speculative Value Table (VT)—indexed by the predicted address—to predict the value next. To increase the coverage of AVPP, we aim to increase the hit rate of the VT by predicting also the load address of a future instance of the same load instruction and prefetching its value in the VT. We show that AVPP is relatively easy to implement, requiring only 2.5% of the area of a 32KB L1 data cache. We compare our mechanism with five state-of-the-art value prediction techniques, evaluated within the context of load value prediction, in a relatively narrow out-of-order processor. On average, our AVPP predictor achieves 11.2% speedup and 3.7% of energy savings over the baseline processor, outperforming all the state-of-the-art predictors in 16 of the 23 benchmarks we evaluate. We evaluate AVPP implemented together with different prefetching techniques, showing additive performance gains (20% average speedup). In addition, we propose a new taxonomy to classify different value predictor policies regarding predictor update, predictor availability, and in-flight pending updates. We evaluate these policies in detail.

Read full abstract

In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling. The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems. The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling. Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly.

Read full abstract

Data Cache Research Articles

Related Topics

Articles published on Data Cache

PR-LRU: partial random LRU technique for performance improvement of last level cache

AccConF: An Access Control Framework for Leveraging In-Network Cached Data in the ICN-Enabled Wireless Edge

ICRA: index based cache replacement algorithm for cloud storage

Hybrid Remote Access Protocol

ICRA: index based cache replacement algorithm for cloud storage

PR-LRU: partial random LRU technique for performance improvement of last level cache

Design and Analysis of an Effective Two-Step Clustering Scheme to Optimize Prefetch Cache Technology

SCORE

Analysis of the method of optimized data caching in the Content Delivery Network

AVPP

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Hardware Design of Multichannel Video Acquisition System Based on FPGA

A Column-aware Data Caching Method and System

Analyzing Data Cache Related Preemption Delay With Multiple Preemptions

Information-Centric Networking Security

Automatic Orbit Selection for a Radio Interferometric Spacecraft Constellation

Analysis and approximation of optimal co-scheduling on cmp

Spatial Intelligence toward Trustworthy Vehicular IoT

Dynamic Network Formation Game With Social Awareness in D2D Communications

Adaptive multilevel fuzzy-based authentication framework to mitigate Cache side channel attack in cloud computing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Cache Research Articles

Related Topics

Articles published on Data Cache

PR-LRU: partial random LRU technique for performance improvement of last level cache

AccConF: An Access Control Framework for Leveraging In-Network Cached Data in the ICN-Enabled Wireless Edge

ICRA: index based cache replacement algorithm for cloud storage

Hybrid Remote Access Protocol

ICRA: index based cache replacement algorithm for cloud storage

PR-LRU: partial random LRU technique for performance improvement of last level cache

Design and Analysis of an Effective Two-Step Clustering Scheme to Optimize Prefetch Cache Technology

SCORE

Analysis of the method of optimized data caching in the Content Delivery Network

AVPP

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Hardware Design of Multichannel Video Acquisition System Based on FPGA

A Column-aware Data Caching Method and System

Analyzing Data Cache Related Preemption Delay With Multiple Preemptions

Information-Centric Networking Security

Automatic Orbit Selection for a Radio Interferometric Spacecraft Constellation

Analysis and approximation of optimal co-scheduling on cmp

Spatial Intelligence toward Trustworthy Vehicular IoT

Dynamic Network Formation Game With Social Awareness in D2D Communications

Adaptive multilevel fuzzy-based authentication framework to mitigate Cache side channel attack in cloud computing