Abstract
Ineffective data access of cache memory has become a bottleneck for efficient 2-dimensional (2-D) data processing, such as image processing and matrix multiplication. To solve this problem, a cache memory with both unit tile and unit line accessibility, based on a 4-level Z-order tiling layout is proposed. Conventional raster scan order access to this layout is enabled via a hardware-based address translation, which can eliminate the overhead of address calculation. The proposed cache can access data in parallel in the column (unit tile) and row (unit line) directions by using the 4-level Z-order tiling layout and multi-bank cache organization. Unit tile access corresponding to parallel data access in the column direction can exploit the 2-D locality. Simulation results show that the 4-level Z-order tiling layout provides less TLB and L1 data cache misses compared to the raster scan order and Morton order layouts in matrix multiplication, especially for the larger matrix size, LU decomposition, successive over relaxation, and matrix transposition benchmarks. An LSI chip of the proposed cache combined with an SIMD-based datapath was designed in a 2.5×5 mm2 area by using 0.18 μm CMOS technology. Under the 3.8 ns clock period, the read latency was suppressed to 3 clock cycles, the same as the conventional cache memory of an Intel or ARM high-performance processor.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.