Abstract

Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. This article focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70 percent on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33 percent over HEFT, and 82 percent over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call