Abstract

SummaryProgramming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high‐level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile‐time‐specific component and a runtime‐specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi‐CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade‐off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi‐CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call