Abstract
Halide, a Domain Specific Language (DSL) for image and array processing, promotes the separation of functional algorithm from execution schedule, making it easier for the user to optimize the code for different hardware platforms. Halide supports multiple back-end APIs, including CUDA and OpenCL for GPUs. Although many modern Digital Signal Processors (DSPs) support OpenCL, achieving high performance on these devices require the use of features beyond those required to support GPUs. Without those features, there is a strict limit to the effectiveness of targeting a DSP through the Halide OpenCL back-end. In this paper, we describe a set of Halide extensions and optimizations required to effectively support a DSP target, including DMA Promotion, Type Width Reduction and Intrinsic generation. We evaluate the effects of our optimization on Cadence Vision DSP and report the results. On an average, we observe 88X speedup over the baseline generated OpenCL code, and the performance is comparable to handcrafted OpenCL for the target platform.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.