Abstract

The cost and performance of embedded systems heavily depends on the performance of memories it utilizes. Latency of a memory access is one of the major bottlenecks in the system performance. In software compilation, it is known that there are high variations in memory access latency depending on the ways of storing/retrieving variables in code to/from memories. To improve the latency, it needs a technique to maximize the use of memory bandwidth. A burst transfer is well known technique to maximally utilize memory bandwidth. The burst transfer capability offers an average access time reduction of more than 65 % for an eight-word sequential transfer. However, the problem of utilizing such burst transfers has not been generally addressed, and unfortunately, it is not tractable. In this work, we present a new technique that both identifies sequences of single load and store instructions for combining into burst transfers. The proposed technique provides an optimal data placement of nonarray variables to achieve the maximum utilization of burst data transfers. The major contributions of our work are, 1) we prove that the problem is NP-hard and 2) we propose an exact formulation of the problem and an efficient data placement algorithm. From experiments with a set of multimedia benchmarks, we confirm that our proposed technique uses on average 7 times more burst accesses than generated codes from ARM commercial compiler.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call