Abstract

The general block distribution of a matrix is a rectilinear partition of the matrix into orthogonal blocks such that the maximum sum of the elements within a single block is minimized. This corresponds to partitioning the matrix onto parallel processors so as to minimize processor load while maintaining regular communication patterns. Applications of the problem include various parallel sparse matrix computations, compilers for high-performance languages, particle in cell computations, video and image compression, and simulations associated with a communication network. We analyze the performance guarantee of a natural and practical heuristic based on iterative refinement, which has previously been shown to give good empirical results. When p2 is the number of blocks, we show that the tight performance ratio is θ(p). When the matrix has rows of large cost, the details of the objective function of the algorithm are shown to be important, since a naive implementation can lead to a Ω(p) performance ratio. Extensions to more general cost functions, higher-dimensional arrays, and randomized initial configurations are also considered.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.