Given N matrices A <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> , A <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2,</sub> ...,A <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sub> of size NtimesN, the matrix chain product problem is to compute A <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> timesA <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> times...timesA <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sub> . Given an NtimesN matrix A, the matrix powers problem is to calculate the first N powers of A, that is, A, A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ,..., A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sup> . We solve the two problems on distributed memory systems (DMSs) with p processors that can support one-to-one communications in T(p) time. Assume that the fastest sequential matrix multiplication algorithm has time complexity O(N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">alpha</sup> ), where the currently best value of a is less than 2.3755. Let p be arbitrarily chosen in the range 1lesplesN <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">alpha+1</sup> /(log N) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . We show that the two problems can be solved by a DMS with p processors in T <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">chain</sub> (N,p)=O((N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">alpha+1</sup> /p)+T(p))((N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2(2+1/alpha</sup> /p <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2/alpha</sup> )(log+p/N) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1-2/alpha</sup> +log+((p log N)/N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">alpha</sup> )) and T <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">power</sub> (N,p)=O(N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">alpha+1</sup> /p+T(p)((N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2(1+1/alpha)</sup> /p <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2/alpha</sup> )(log+p/2 log N) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1-2/alpha</sup> +(log N) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ))) times, respectively, where the function log+ is defined as follows: log+ x=log x if xges1 and log+ x=1 if 0<x<1. We also give instantiations of the above results on several typical DMSs and show that computing matrix chain product and matrix powers are fully scalable on distributed memory parallel computers (DMPCs), highly scalable on DMSs with hypercubic networks, and not highly scalable on DMSs with mesh and torus networks.
Read full abstract