Abstract

This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi‐port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application‐specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global) partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality‐driven model‐based multi‐processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high‐end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

Highlights

  • The recent spectacular technology has enabled implementation of very complex multi-processor systems on single chips (MPSoCs)

  • We proposed to design the application-specific hierarchical partitioned organizations of the communication architectures and vectorized memories exploiting the regularity and hierarchy of the actual information flows of a given application

  • We demonstrated that for the moderate parallelism levels, the two-level architectures with several small global communication-free clusters or a single global cluster perform well, with performance gains as high as 12 times and area savings as high as 25 times compared to the flat communication scheme

Read more

Summary

Introduction

The recent spectacular technology has enabled implementation of very complex multi-processor systems on single chips (MPSoCs). To decide the most suitable architecture, the most promising architectures constructed during the DSE are analyzed in relation to the quality metrics of interest and basic controllable system attributes affecting them (e.g., number of accelerator modules of each kind, clock frequency of each module, communication structures between modules, schedule, and binding of the required behavior to the modules, etc.), and the results of this analysis are compared to the design constraints and optimization objectives This way the designer receives feedback, composed of a set of instantiated architectures and important characteristics of each of the architectures, showing to what degree the particular design objectives and constraints are satisfied by each of them. If all the constraints and objectives are met to a satisfactory degree, the corresponding final application-specific architecture template is instantiated, further analyzed, and refined to represent the actual detailed design of the required accelerator

Communication and Memory Architecture Design for High-End Multiprocessors
Case Study
Flat better
Communication and Memory Partitioning Strategies
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call