Memory Access Policies Research Articles

SummarySparse matrix‐vector multiplication dominates the performance of many scientific and industrial problems. For example, iterative methods for solving linear systems often rely on the performance of this critical operation. The particular case of binary matrices shows up in several important areas of computing, such as graph theory and cryptography. Unfortunately, irregular memory access patterns cause poor memory throughput, slowing down this operation. To maximize memory throughput, we translate the matrix into a straight‐line program that takes advantage of the CPU's instruction cache and hardware prefetchers. The regular loopless pattern of the program reduces cache misses, thus decreasing the latency for most instructions. We focus on the widely used x86_64 architecture and on binary matrices, to explore several possible tradeoffs regarding memory access policies and code size. We also consider matrices with elements over various mathematical structures, such as floating‐point reals and integers modulo m. When compared to a Compressed Row Storage implementation, we obtain significant speedups. Copyright © 2014 John Wiley & Sons, Ltd.

Six load/store issue and memory access policies are studied for improving the instructions per clock (IPC) of current superscalar processors. Experiments using instruction‐driven simulation were used to demonstrate the performance potential of these methods. The three key factors in rank to improve the IPC are found to be: (1) allowing a load issue to bypass an un‐issued store and an uncommitted store, (2) de‐coupling the store‐value from the store‐address for store issue, and (3) constraining the speculative load accesses in the load buffer until the preceding store addresses are known. The most effective issue policy is to allow a load to bypass un‐issued stores, decouple the store‐value from store address, and constrain the speculative load accesses in the load buffer until the preceding store addresses are known. This paper also explored mechanisms to avoid deadlocks for the cases where the load/store buffer slots are not allocated at the time the reservation stations are allocated. Examples are used to show how the deadlocks occur and the means to prevent them.

Memory Access Policies Research Articles

Related Topics

Articles published on Memory Access Policies

Straight‐line programs for fast sparse matrix‐vector multiplication

Performance evaluation of load/store issue and memory access policies

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory Access Policies Research Articles

Related Topics

Articles published on Memory Access Policies

Straight‐line programs for fast sparse matrix‐vector multiplication

Performance evaluation of load/store issue and memory access policies