Abstract

The domain decomposition method (DDM) is an efficient algorithmic tool for the parallelization of finite element computer codes. A variant of the DDM with direct solution algorithm is based on computation of Schur complement matrices for finite element partitions. This paper describes a simple technique that considerably improves execution rate of computationally intensive routines of the Schur complement computations. The technique uses ‘block of columns’ matrix operations and loop unrolling to reduce load instructions from cache memory and to increase instruction-level parallelism. For superscalar RISC processors, experimental results show that it is possible to improve performance of the DDM solution procedure by several times.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call