Abstract
SummaryIneffective column‐directional cache memory access has become a bottleneck for efficient two‐dimensional (2‐D) data processing utilizing extended single instruction multiple data (SIMD) instructions. To solve this problem, we propose a cache memory with tile (column and row directions) and line (row direction) accessibility for efficient 2‐D data processing. 2‐D data access to the proposed cache memory is enabled via a hardware‐based multi‐mode address translation unit that eliminates the overhead of software‐based address calculation. To reduce the hardware overhead of the proposed cache, we propose a tag memory reduction method that replaces multiple tiles with an aligned tile set (RATS) in the cache. To verify the feasibility of the proposed cache, an LSI layout of a SIMD‐based general purpose‐oriented datapath embedding the proposed cache is designed in a 2.5×5 mm2 area using 0.18‐μm CMOS technology. Under a 3.9‐ns clock period (250 MHz), the read latency is limited to 3 clock cycles, which is the same as that for the conventional cache memory. Using the RATS method, the entire hardware overhead of the proposed cache is reduced to only 7% of that required for a conventional cache. In addition, simulation results for the proposed cache indicate a considerable reduction of L1 and L2 cache confliction misses compared with a conventional cache in power‐of‐two matrix size due to the column‐directional address stride being sufficiently smaller than page size. Therefore, the proposed cache provides efficient column‐directional parallel access as same as row‐directional parallel access so that it enables efficient SIMD operation requiring no transposition in matrix multiplication (MM). For LU decomposition (LUD), the proposed cache can provide almost the same performance to the column‐major–based LUD program as that to the row‐major–based LUD program. These results show that the proposed cache does not restrict our freedom in selecting either row‐ or column‐major order coding.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Concurrency and Computation: Practice and Experience
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.