Automatic cache partitioning method for high-level synthesis

Bryant Jones,Darrin M Hanna

doi:10.1016/j.micpro.2019.02.013

Abstract

Existing algorithms can be automatically translated from software to hardware using High-Level Synthesis (HLS), allowing for quick prototyping or deployment of embedded designs. High-level software is written with a single main memory in mind, whereas hardware designs can take advantage of many parallel memories. The translation and optimization of memory usage, and the generation of resulting architectures, is important for high-performance designs. Tools provide optimizations on memory structures targeting data reuse and partitioning, but generally these are applied separately for a given object in memory. Memory access that cannot be effectively optimized is serialized to the memory, hindering any further parallelization of the surrounding generated hardware.In this work, we present an automated optimization method for creating custom cache memory architectures for HLS generated designs. Our optimization uses runtime profiling data, and is performed at a localized scope. This method combines data reuse savings and memory partitioning to further increase the potential parallelism and alleviate the serialized memory access, increasing performance. Comparisons are made against architectures without this optimization, and against other HLS caching approaches. Results are presented showing this method requires 72% of the number of execution cycles compared to a single-cache design, and 31% compared to designs with no caches.

Full Text