Abstract

Utilizing Hardware Accelerators (ACCs) is a promising solution to improve performance and power efficiency of Chip Multi-Processors (CMPs). However, new challenges arise with the trend of shifting from few ACCs (with sparse ACCs coverage) to many ACCs (denser ACCs coverage) on a chip. The primary challenges are a lack of clear semantics in ACC communication as well as a processor-centric view for orchestrating the entire system. This paper opens a path toward efficient integration of many ACCs on a single chip. To this end, the paper at first identifies 4 major semantic aspects when two ACCs communicate with each other: data access model, data granularity, marshalling, and synchronization. Based on the identified semantics, the paper then proposes an efficient architecture solution, Transparent Self-Synchronizing (TSS), to realize the identified semantics in the underlying architecture. In principle, TSS proposes a shift from the current processor-centric view to a more equal, peer view between ACCs and the host processors. TSS minimizes the interaction with the host processor and reduces the volume of ACC-to-ACC communication traffic exposed to the system fabric. Our results using 8 streaming applications with a varying ACC coverage density demonstrate significant benefits of TSS, including a 3x speedup over the current ACC-based architectures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.