Tensor Optimization for High-Level Synthesis Design Flows

Marco Siracusa,Fabrizio Ferrandi

doi:10.1109/tcad.2020.3012318

Abstract

Improving data locality of tensor data structures is a crucial optimization for maximizing the performance of machine learning and intensive linear algebra applications. While CPUs and GPUs improve data locality by means of automated caching mechanisms, FPGAs let the developer specify data structure allocation. Although this feature enables a high degree of customizability, the increasing complexity and memory footprint of modern applications prevent considering any manual approach to find an optimal allocation. For this reason, we propose a compiler optimization to automatically improve the tensor allocation of high-level software descriptions. The optimization is controlled by a flexible cost model that can be tuned by means of simple yet expressive callback functions. In this way, the user can tailor the optimization strategy with respect to the optimization goal. We tested our methodology integrating our optimization in the Bambu open-source HLS framework. In this setting, we achieved a 14% speedup on the digit recognition version proposed by the Rosetta benchmark. Moreover, we tested our optimization on the CHStone benchmark suite, achieving an average of 6% speedup. Finally, we applied our methodology on two industrial examples from the aerospace domain obtaining a 15% speedup. As a final step, we tested the versatility of our methodology inserting our optimization in the Clang software optimization flow achieving a 12% speedup on the Rosetta benchmark when running on CPU.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Oct 2, 2020
Citations: 21	License type: other-oa

R Discovery Prime

R Discovery Prime

Tensor Optimization for High-Level Synthesis Design Flows

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Similar Papers

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Research on the application of machine learning in business analytics: Cases of Amazon and eBay
Rongrong Zhang
Applied and Computational Engineering | VOL. 51
Rongrong ZhangRongrong Zhang
25 Mar 2024
Applied and Computational Engineering | VOL. 51

A systematic approach to improving data locality across Fourier transforms and linear algebra operations
Andrew Canning ... John Shalf
-
Andrew Canning, et. al.Andrew Canning ... John Shalf
03 Jun 2021
03 Jun 2021

Hardware-aware machine learning
Ermao Cai ... Dimitrios Stamoulis
-
Ermao Cai, et. al.Ermao Cai ... Dimitrios Stamoulis
05 Nov 2018
05 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tensor Optimization for High-Level Synthesis Design Flows

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems