OpTIFlow &amp;#x2013; An optimized end-to-end dataflow for accelerating deep learning workloads on heterogeneous SoCs

Shyam Jagannathan,Piyali Goswami,Marco Herrera,Aniket Limaye,Pramod Swami,Emmanuel Madrigal,Kumar Desappan,Carlos Rodriguez,Vijay Pothukuchi,Rahul Ravikumar,Jesse Villarreal,Mihir Mody,Manu Mathew

doi:10.2352/ei.2023.35.16.avm-113

Abstract

A typical edge compute SoC capable of handling deep learning workloads at low power is usually heterogeneous by design. It typically comprises multiple initiators such as real-time IPs for capture and display, hardware accelerators for ISP, computer vision, deep learning engines, codecs, DSP or ARM cores for general compute, GPU for 2D/3D visualization. Every participating initiator transacts with common resources such as L3/L4/DDR memory systems to seamlessly exchange data between them. A careful orchestration of this dataflow is important to keep every producer/consumer at full utilization without causing any drop in real-time performance which is critical for automotive applications. The software stack for such complex workflows can be quite intimidating for customers to bring-up and more often act as an entry barrier for many to even evaluate the device for performance. In this paper we propose techniques developed on TI’s latest TDA4V-Mid SoC, targeted for ADAS and autonomous applications, which is designed around ease-of-use but ensuring device entitlement class of performance using open standards such as DL runtimes, OpenVx and GStreamer.

Full Text