Abstract

Aiming to ease the parallel programming for heterogeneous architectures, we propose and implement a high-level OpenCL runtime that conceptually merges multiple heterogeneous hardware devices into one virtual heterogeneous compute device (VHCD). Moreover, automated workload distribution among the devices is based on offline profiling, together with new programming directives that define the device-independent data access range per work-group. Therefore, an OpenCL program originally written for a single compute device can, after inserting a small number of programming directives, run efficiently on a platform consisting of heterogeneous compute devices. Performance is ensured by introducing the technique of virtual cache management, which minimizes the amount of host-device data transfer. Our new OpenCL runtime is evaluated by a diverse set of OpenCL benchmarks, demonstrating good performance on various configurations of a heterogeneous system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.