Abstract

A model to estimate the performance of graph partitioning running on heterogeneous multi-core clusters is proposed.We discover pitfalls of conventional methodologies in obtaining model parameters from multi-core systems.The impact of intra-node contention is too significant to be ignored.Modeling accuracy depends on whether overlap is adequately considered.Characteristics of input meshes may affect memory access behavior and hence become a determinant factor. Considering application behavior in graph partitioning is an arduous task because of the chicken-and-egg problem: the application behavior depends on how the graph is decomposed while achieving load balance requires the knowledge of how the application utilizes the underlying resources. Advances in multi-core processors further complicate the endeavor by introducing hardware diversity and intra-node contention. As an attempt to quantify performance for partitioning refinement, we propose a model that predicts execution times of iterative mesh-based applications running on heterogeneous multi-core clusters. Apart from considering resource heterogeneity, the model takes into account hierarchical communication characteristics, overlap between computation and communication, as well as performance penalties due to intra-node contention. We present a detailed methodology on how to obtain key parameters from a real system and highlight potential pitfalls of conventional approaches in obtaining the parameters. Experiments were conducted using a synthetic application benchmark solving a partial differential equation. Evaluation shows a good agreement between actual time measurement and the performance model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call