Fast Blockwise Matrix-Matrix Multiplication Using AVX and Prefetching on Shared Memory

Nwe Zin Oo,Panyayot Chaikan

doi:10.23919/ascc56756.2022.9828222

Nwe Zin Oo, Panyayot Chaikan

https://doi.org/10.23919/ascc56756.2022.9828222

Copy DOI

Export

Save

Cite

Publication Date: May 4, 2022

Affiliation: Prince of Songkla University

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent multicore architectures, parallel computing with vectorization emerges for mathematical calculations and image processing by exploiting Intel Advanced Vector Extensions (AVX). In parallel computing, the performance of modern processors depends on many factors, such as the amount of memory storage, the size of caches, the number of available processors, and the programming methodologies for efficiency. Besides that, a hundred vector intrinsics supported by AVX can perform various calculations on floating-point operations that can be optimized by applying AVX-256 or AVX-512 registers. In this paper, fast recursive matrix multiplication is proposed by using AVX and OpenMP combined with the software prefetching method on block-wise matrix-matrix multiplication on shared memory. The proposed version applied prefetching which can attempt to reduce memory latencies by prefetching memory pages in advance before they have been used. As a result, when the matrix multiplication applied prefetching method for the size of 8192×8192, the execution time was reduced about 22% and the performance was enhanced by about 17% than without applying prefetching while testing on Intel core i7 processor.

Full Text