Abstract

The article presents optimization techniques for two Python-based large-scale social sciences applications: SN (Social Network) Simulator and KPM (Kernel Polynomial Method). These applications use MPI technology to transfer data between computing processes, which in the regular implementation leads to load imbalance and performance degradation. To avoid this effect, we propose a 2-stage optimization. In the first step, the order of tasks is changed, and in the second step, the tasks are divided into smaller ones for easier allocation. In addition, we focus on mitigating performance and memory bottlenecks using modern ccNUMA systems with multiple NUMA domains. As part of the performance analysis, the limitations of communication in data traffic between and within the processor were revealed and resolved through appropriate data allocation. Benchmarking was carried out, examining various environments, including vendors of traditional x86-64 and ARM-based processors for HPC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call