In this paper we examine ways of solving dense, banded systems on different parallel processors. We start with some considerations for processors with vector instructions, then discuss various algorithms for the solution of large, dense, banded systems on a parallel processor. We analyze the behavior of the parallel algorithms on distributed-storage architectures configured as rings, two-dimensional meshes with end-around connections (tori), boolean n-cube configured architectures, and bus-based and switch-based machines with shared storage. We also present measurements for two bus-based architectures with shared storage, namely, the Alliant FX/8 and the Sequent Balance 21000.
Read full abstract