Abstract

We develop a GPU accelerated variant of the blocked Bartels-Stewart algorithm for the generalized Lyapunov equation.We analyze the influence of optimized data alignment on the GPU and discuss the mathematical problem behind it.Reordering eigenvalues avoids alignment problems on the GPU and accelerates the computation. The solutions of Lyapunov and generalized Lyapunov equations are a key player in many applications in systems and control theory. Their stable numerical computation, when the full solution is sought, is considered solved since the seminal work of Bartels and Stewart R. H. Bartels, G. W. Stewart, Solution of the matrix equation A X + X B = C : Algorithm 432, Comm. ACM 15 (1972) 820-826.. A number of variants of their algorithm have been proposed, but none of them goes beyond BLAS level-2 style implementation. On modern computers, however, the formulation of BLAS level-3 type implementations is crucial to enable optimal usage of cache hierarchies and modern block scheduling methods based on directed acyclic graphs describing the interdependence of single block computations. In this contribution, we present the port of our recent BLAS level-3 algorithm M. Kohler, J. Saak, On BLAS Level-3 implementations of common solvers for (quasi-) triangular generalized Lyapunov equations, SLICOT Working Note 2014-1, NICONET e.V., available from www.slicot.org (Sep. 2014). to a GPU accelerator device.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call