Abstract

Parallel programming across many CPU cores offers many challenges in software design, such as mitigating performance or efficiency loss in applications that reach synchronization points at varying times across the CPU cores. Existing solutions often aim to resolve this through clever optimizations in application design, or by reacting to the imbalance by throttling the CPU core frequency of the early-finishing cores at application run time. In this work, we propose a method to rebalance bulksynchronous MPI applications by selectively speeding up the latefinishing cores throughout application run time. This algorithm makes use of the new Intel® Speed Select Turbo Frequency feature that enables software to guide the hardware toward increasing the turbo frequency limits of some cores in exchange for decreased turbo frequency limits in other cores. We demonstrate up to 40% energy reduction and 17% execution time reduction in a highly-imbalanced, compute-bound benchmark application and up to 21% energy reduction with 5% execution time reduction in an imbalanced real-world application.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.