Abstract
In recent multicore architectures, parallel computing with vectorization emerges for mathematical calculations and image processing by exploiting Intel Advanced Vector Extensions (AVX). In parallel computing, the performance of modern processors depends on many factors, such as the amount of memory storage, the size of caches, the number of available processors, and the programming methodologies for efficiency. Besides that, a hundred vector intrinsics supported by AVX can perform various calculations on floating-point operations that can be optimized by applying AVX-256 or AVX-512 registers. In this paper, fast recursive matrix multiplication is proposed by using AVX and OpenMP combined with the software prefetching method on block-wise matrix-matrix multiplication on shared memory. The proposed version applied prefetching which can attempt to reduce memory latencies by prefetching memory pages in advance before they have been used. As a result, when the matrix multiplication applied prefetching method for the size of 8192×8192, the execution time was reduced about 22% and the performance was enhanced by about 17% than without applying prefetching while testing on Intel core i7 processor.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have