Tiled many-core processors are designed to integrate simple cores onto a single chip to take advantage of software-level parallelism, and these cores are interconnected via mesh-based networks to mitigate overheads such as limited throughput derived from traditional interconnects. As these processors become more prevalent, one unnoticed problem is that it is more likely for operating system (OS) designers to believe that these processors, which have multiple on-chip memory controllers, belong to the non-uniform memory access (NUMA) system. In this paper, we define novel models regarding the differentiation between uniform memory access and NUMA on tiled many-core processors from the perspective of the cache system to facilitate OS designers and application programmers in fully understanding the underlying hardware. Whether or not a tiled many-core processor belongs to the NUMA system, is determined by the cache system rather than how many memory controllers it has. The experimental results together with the novel models are able to explain why the (non-)significant performance difference can be observed on KNL and TILE-Gx72.
Read full abstract