Data Movement Accelerator Engines on a Prototype Power10 Processor

Yutaka Sugawara,Ruud A Haring,Ralph Bellofatto,Krishnan Sugavanam,Ben J Nathanson,Robert M Senger,Dong Chen,Abdullah Kayi,Craig Stunkel,Eugene Ratzlaff

doi:10.1109/mm.2022.3193949

Yutaka Sugawara, Ruud A Haring + Show 8 more

https://doi.org/10.1109/mm.2022.3193949

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This article presents the design and implementation of active messaging engines (AMEs) on an IBM Power10 prototype chip. AMEs are tiny, simple, but fully programmable 64-bit processors, for offloading operations related to data movement. AMEs can offload the execution flow of the message passing interface and other messaging stacks from the host central processing unit, enabling truly asynchronous progress to overlap computation and communication. The AMEs are implemented as onboard OpenCAPI-compliant accelerators, leveraging existing OpenCAPI infrastructure. As realized in a 7-nm technology, each AME takes 0.034 mm2 of silicon area and 4.1 mW of power. AME performance is evaluated across several contiguous and noncontiguous memory copy scenarios. AMEs can perform up to the bandwidth limit of their access path to the main memory (32 GB/s) and incur a per-request overhead of about 600 ns. These results indicate that AMEs will confer advantages to general messaging libraries for processing, sending, and receiving on-node and off-node messages.

Full Text