We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. These solvers are highly parallel and they perform a factor of 0.4P1/6 less communication than existing algorithms, where P is number of processors. The new solvers reduce communication at the expense of using more temporary storage. Previously, algorithms that reduce communication by using more memory were only known for matrix multiplication. Our algorithms are recursive, elegant, and relatively simple to implement. We have implemented them using MPI, a message-passing libray, and tested them on a cluster of workstations.
Read full abstract