Abstract

Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix transposition on real hardware with pseudo-LRU (PLRU) caches. Our analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound. In general and high-performance computation, our analysis enables us to restrict the cache resources devoted to matrix transposition with no negative impact, in order to reduce both the energy consumption and the pollution to other computations.

Highlights

  • Matrix transposition is a fundamental operation in linear algebra, fast fourier transforms, and so forth, and has many applications in areas such as numerical analysis, image processing and graphics

  • We study the in-place matrix transposition theoretically, and how the cache parameters in a Least-Recently-Used (LRU) cache, and the padding applied to the matrix affect the data hit ratio of the tiling version

  • A further problem arises for the cache oblivious matrix transposition algorithm, which benefits from the application of a phantom padding, which is analyzed in this paper

Read more

Summary

Introduction

Matrix transposition is a fundamental operation in linear algebra, fast fourier transforms, and so forth, and has many applications in areas such as numerical analysis, image processing and graphics. As Tsifakis et al conclude [8], “predicting a priori how the cache oblivious algorithm will perform is non-trivial”, so previous drawbacks are still present It avoids adding the tile size as an extra parameter to consider. We study the in-place matrix transposition theoretically, and how the cache parameters (number of sets, ways, and line size) in a Least-Recently-Used (LRU) cache, and the padding applied to the matrix affect the data hit ratio of the tiling version. Hit-miss results can be predicted for the tiling version of matrix transposition (mandatory for real-time applications) in an LRU cache and, under specific cache configurations, in a PLRU cache.

Related Work
Ideal Data Cache Hit Ratios in Matrix Transposition
Bound of the Ideal Hit Ratio with a Cache Line Holding L Elements
Reducing the Requirements of Set-Associative LRU Caches with Further Padding
Experiments
Data Cache Hit Ratio in Tiled Matrix Transposition
Experiments tile dimension
10. Data ratios
Analytical Modeling of Energy Consumption
Energy Consumption Estimation and Possible Improvements
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.