Conflict-Free Accesses to Strided Vectors on a Banked Cache

A Seznec,R Espasa

doi:10.1109/tc.2005.110

Abstract

With the advance of integration technology, it has become feasible to implement a microprocessor, a vector unit, and a multimegabyte bank-interleaved L2 cache on a single die. Parallel access to strided vectors on the L2 cache is a major performance issue on such vector microprocessors. A major difficulty for such a parallel access is that one would like to interleave the cache on a block size basis in order to benefit from spatial locality and to maintain a low tag volume, while strided vector accesses naturally work on a word granularity. In this paper, we address this issue. Considering a parallel vector unit with 2/sup n/ independent lanes, a 2/sup n/ bank interleaved cache, and a cache line size of 2/sup k/ words, we show that any slice of 2/sup n+k/ consecutive elements of any strided vector with stride 2/sup r/R with R odd and r /spl les/ k can be accessed in the L2 cache and routed back to the lanes in 2/sup k/ subslices of 2/sup n/ elements.

Full Text