Abstract

In this paper we present a new data partitioning algorithm to improve the performance of parallel matrix multiplication of dense square matrices on heterogeneous clusters. Existing algorithms either use single speed performance models which are too simplistic or they do not attempt to minimise the total volume of communication. The functional performance model (FPM) is more realistic then single speed models because it integrates many important features of heterogeneous processors such as the processor heterogeneity, the heterogeneity of memory structure, and the effects of paging. To load balance the computations the new algorithm uses FPMs to compute the area of the rectangle that is assigned to each processor. The total volume of communication is then minimised by choosing a shape and ordering so that the sum of the half-perimeters is minimised. Experimental results demonstrate that this new algorithm can reduce the total execution time of parallel matrix multiplication in comparison to existing algorithms.KeywordsParallel matrix multiplicationfunctional performance modelsheterogeneous platformsload balancedata partitioning

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.