Abstract

Matrix-vector multiplication is an important component of a large number of parallel algorithms. Over the years, many parallel formulations of matrix-vector multiplication have been proposed. However, they all tend to suffer from the some basic problem: while they may perform well for square matrices, or matrices with moderate aspect ratios, their efficiency deteriorates considerably for matrices with large aspect ratios. This paper proposes novel techniques for improving the efficiency of matrix-vector multiplication for matrices with large aspect ratios. The basic approach involves partitioning the matrix and vector over a logical array of processors, which is then embedded in the physical architecture. The dimensions of the logical array are chosen so as to minimise the communication overhead associated with the algorithm. Two popular families of parallel architectures are considered: square meshes with wraparound connections, and hypercubes. Theoretical results show that, for large numbers of processors, and for matrices with large aspect ratios, the new schemes perform significantly better than existing ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call