Abstract

This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call