Innovative processor architecture designs are shifting towards Many-Core Architectures (MCAs) to meet the future demands of high-performance computing as the limits of Moore’s Law have almost been reached. Many-core processors utilize shared memory hierarchies to achieve high-speed memory systems, improving memory access efficiency. However, as the number of cores multiplies, the scalability of this system is significantly constrained by the increased proportion of long-distance and Non-Uniform Memory Access (NUMA). Improving the scalability of MCAs is crucial for achieving large/super-scale general-purpose many-core processors. This work proposes a high scalability memory Network-on-Chip (NoC) for Triplet-Based Many-Core Architecture (TriBA), named TriBA-mNoC. TriBA-mNoC maintains a consistent core-to-core spacing as the network scale increases, effectively preventing increased long-distance memory access latency. Moreover, it leverages an inherent advantage of shared-inside hierarchical-groupings, alleviating common NUMA issues in the NoC design. Evaluations of static network characteristics show that TriBA-mNoC outperforms most classical NoCs in network diameter, average distance, and cost. TriBA-mNoC can be integrated with TriBA in the same silicon die with a tile-like floorplan, forming a novel NoC called TriBA-NoC, which can combine the strengths of both networks to maximize the architecture performance. We evaluated the memory access performance and scalability of TriBA-NoC using the mathematical evaluation models and actual simulations with real traffic (PARSEC 3.0 and SPLASH-2) at different network scales. The mathematical evaluation results indicate that TriBA-NoC achieves an aggregate speedup of approximately 3x compared with 2D-Mesh for a similar number of cores. Furthermore, TriBA-NoC’s single-core speedup efficiency remains stable as the number of cores increases under the same cache hit ratio, while 2D-Mesh experiences a rapid decline, highlighting TriBA-NoC’s exceptional scalability. Finally, the actual traffic simulation results show that TriBA-NoC achieves an average memory access latency and time reduction of 25.90% − 40.50% and 5.61% − 31.69% respectively, compared with 2D-Mesh.
Read full abstract