Interprocessor communication for high performance, explicit time integration

Georgios Petropoulos,Gregory L Fenves

doi:10.1007/s00366-010-0174-x

Abstract

Parallel, explicit finite element analysis is based almost exclusively on point-to-point interprocessor communication. However, point-to-point communication on multicore architectures results in large performance variability because of shared caches and sockets. The interprocessor communication required during the solution phase must be designed to achieve a high degree of scalability and performance for explicit time integration operators. An analysis of point-to-point communication on different hardware platforms, communication library implementations, and message sizes demonstrates the need for a flexible software design that allows for optimization. Autotuning modules and preliminary performance tests are necessary to identify the optimal combination of calls. Performance differences of point-to-point messaging on multicore machines are illustrated with a test that uses combinations of MPI communication calls. The differences are apparent when cache and sockets are shared among the cores and for message sizes up to 1.5 MB. Alternative communication schemes are shown to perform faster depending on the architecture and message size. Nearly linear scalability results for explicit time integration are demonstrated using the design techniques.

Full Text