Abstract

An efficient technique for partitioning and programming linear algebra algorithms on concurrent architectures is described and applied to 2-D wavefront arrays. The mapping of the computational elements (processes) to processors is based on the concept of folding. The mapping pattern on the 2-D full-size mesh of processes is composition of symmetric tiles of size 2 square root (N)*2 square root (N), N being the number of processors. The algorithm can be partitioned according to a globally sequential, locally parallel scheme. The code optimisation is performed by programming a few different types of tile, according to the algorithm. When the size of the problem is much larger than the size of the mesh of processors, a linear speed-up is achieved independently of the number of processors. Experimental results are presented for matrix multiplication, LU decomposition and the solution of triangular system equations on 2-D meshes of transputers programmed in Occam.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.