Abstract

Emerging heterogeneous systems architectures increasingly integrate general-purpose processors, GPUs, and other specialized computational units to provide both power and performance benefits. While the motivations for developing systems with accelerators are clear, it is important to design efficient dispatching mechanisms in terms of performance and energy while leveraging programmability and orchestration of the diverse computational components. In this paper, we present an infrastructure composed of a hardware, general, packet-based processing-dispatching unit, named generic packet processing unit (GPPU), and of an associated runtime that facilitates user-level access to GPPU objects, such as packets, queues, and contexts. Hence, we remove drawbacks of traditional costly user-to-kernel-level operations, low-level accelerator subtleties that hinder programming productivity, along with architectural obstacles such as handling accelerators’ unified virtual address space. We present the design and evaluation of our framework by integrating the GPPU infrastructure with data streaming type accelerators, image filtering, and matrix multiplication, tightly coupled to ARMv8 architecture via unified virtual memory. Under scaling workload our proposed dispatching methods can deliver $3.7{\times }$ performance improvement over baseline offloading, and up to $4.7{\times }$ better energy efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.