Abstract
Abstract It is beneficial to exploit multiple levels of parallelism for a wide range of applications, because a typical server already has tens of processor cores now. As the number of cores in a computer is increasing rapidly, efficient support of nested parallelism will be more and more important. We observe that different task-core mapping schemas may result significant performance difference because modern HPC servers are NUMA multi-core systems. So it is important to control the task-core mapping for nested parallelism. However, the number of threads management mechanism in current parallel programming models, such as OpenMP, does not provide enough information for runtime systems to make optimized decision. As a result, current nested parallel applications often suffer from suboptimal task-core mapping and get significant performance loss. To address this problem, we propose NestedMP, a set of directives which extends OpenMP. NestedMP specifies the number of threads of each nested parallel branch in a declarative way and allows runtime systems to see the whole picture of task trees to make locality-aware task-core mapping. We have implemented NestedMP in GCC 4.8.2 and tested the performance on a 4-way 8-core SandyBridge server. The result shows NestedMP improves the performance significantly over GCC’s OpenMP implementation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.