Abstract

Tiled many-core processors (i.e., KNL and the TILE-Gx72 processor), on which processing cores are fitted onto a single chip and cores are interconnected via mesh-based networks, are different from the traditional many-core systems. Their operating system (OS) should be optimized to take into account the unique characteristics (for instance, cores are integrated into a single chip) of tiled many-core processors. This is because these characteristics were not taken into consideration when OSes designed for the traditional multicore (many-core) systems were deployed on tiled many-core processors. In this paper, we propose an optimized load balancing policy to improve the performance of multi-threaded applications. Making a thread select an appropriate idle (lightweight) tile (processing core) across all tiles on the single chip rather than a portion of tiles is able to reduce the overhead triggered by the load balancing policy, the penalty of cache misses because of the scheduling and more threads sharing the same tile (processing core), and the contention for memory controllers due to cache misses. The experimental results demonstrate that the optimized load balancing policy can provide up to 2.7× performance improvement on KNL and mitigate the performance degradation to separate extents on the TILE-Gx72 processor.

Highlights

  • INTRODUCTIONScalability problems, in which the execution time of a multithreaded application designed to take advantage of softwarelevel parallelism (and hardware-level parallelism) cannot be reduced as more threads (processing cores) need to cooperate in the parallel phase(s), are still challenges for application programmers, library (i.e., heap manager) designers, and OS (operating system) designers

  • Scalability problems, in which the execution time of a multithreaded application designed to take advantage of softwarelevel parallelism cannot be reduced as more threads need to cooperate in the parallel phase(s), are still challenges for application programmers, library designers, and operating system (OS) designers

  • Since tiles, remote memory controllers, and local memory controllers are integrated onto the same chip, and non-uniform memory access latency does not dominate program performance on tiled many-core processors, the blocked thread can be awakened on any idle tile on the single chip, instead of its previous scheduling domain that includes a portion of tiles

Read more

Summary

INTRODUCTION

Scalability problems, in which the execution time of a multithreaded application designed to take advantage of softwarelevel parallelism (and hardware-level parallelism) cannot be reduced as more threads (processing cores) need to cooperate in the parallel phase(s), are still challenges for application programmers, library (i.e., heap manager) designers, and OS (operating system) designers. We explain that performance of multi-threaded sharedmemory applications designed for chip multiprocessors can be improved when the policy of load balancing in the Linux kernel is optimized on tiled many-core processors. Since tiles (processing cores), remote memory controllers, and local memory controllers are integrated onto the same chip, and non-uniform memory access latency does not dominate program performance on tiled many-core processors, the blocked thread can be awakened on any idle (or lightweight) tile (processing core) on the single chip, instead of its previous scheduling domain that includes a portion of tiles This is related to the optimized load balancing policy in the Linux kernel proposed in this paper.

BACKGROUND
PERFORMANCE EVALUATION
DISCUSSION
Findings
RELATED WORK
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call