Abstract

The purpose of this paper is to highlight the performance issues of the matrix transposition algorithms for large matrices, relating to the Translation Lookaside Buffer (TLB) cache. The existing optimisation techniques such as coalesced access and the use of shared memory, regardless of their necessity and benefits, are not sufficient enough to neutralise the problem. As the data problem size increases, these optimisations do not exploit data locality effectively enough to counteract the detrimental effects of TLB cache misses. We propose a new optimisation technique that counteracts the performance degradation of these algorithms and seamlessly complements current optimisations. Our optimisation is based on detailed analysis of enumeration schemes that can be applied to either individual matrix entries or blocks (sub-matrices). The key advantage of these enumeration schemes is that they do not incur matrix storage format conversion because they operate on canonical matrix layouts. In addition, several cache-efficient matrix transposition algorithms based on enumeration schemes are offered—an improved version of the in-place algorithm for square matrices, out-of-place algorithm for rectangular matrices and two 3D involutions. We demonstrate that the choice of the enumeration schemes and their parametrisation can have a direct and significant impact on the algorithm’s memory access pattern. Our in-place version of the algorithm delivers up to 100% performance improvement over the existing optimisation techniques. Meanwhile, for the out-of-place version we observe up to 300% performance gain over the NVidia’s algorithm. We also offer improved versions of two involution transpositions for the 3D matrices that can achieve performance increase up 300%. To the best of our knowledge, this is the first effective attempt to control the logical-to-physical block association through the design of enumeration schemes in the context of matrix transposition.

Highlights

  • Matrix transposition is one of the fundamental operations in linear algebra, and is used in many scientific and engineering applications [9]

  • This mapping is achieved using various enumeration schemes and can be applied to both cores [10] or blocks as we propose. – We describe in detail how enumeration schemes can be used to mitigate performance problems associated with Translation Lookaside Buffer (TLB) cache misses, and how to control memory access pattern through them. – We offer an improved version of a thread-wise algorithm that delivers stable performance and high throughput regardless of matrix size. – We propose a modified version of NVIDIA’s out-of-place algorithm by applying an enumeration scheme that delivers sustained high throughput for large matrices. – We demonstrate that the 3D matrix transposition presented in [14] is susceptible to the TLB cache misses

  • This paper presents an improved version of two matrix transposition algorithms

Read more

Summary

Introduction

Matrix transposition is one of the fundamental operations in linear algebra, and is used in many scientific and engineering applications [9]. Based on the optimisations techniques described in [24], a new range of efficient 3D matrix transposition algorithms have been proposed in [14]. – We present an extended version of the concept of mapping a rectangular grid of elements onto a triangular part of a matrix. – We propose a modified version of NVIDIA’s out-of-place algorithm by applying an enumeration scheme that delivers sustained high throughput for large matrices. We extend the analysis of this technique described in [10] and propose an improved version of the transposition algorithm.

Prior Art
Background
Problem Definition
Transparent Block Reordering
Involution Transposition Optimisation
Enumeration Schemes
Basic Schemes
Basic Pairing Functions
V1 Scheme
Banded Schemes
V1B Scheme
Rectangular Schemes
Performance Evaluation
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.