Abstract

The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache locking mechanisms that allow memory blocks to be locked in the cache under software control. Cache locking was primarily designed to offer timing predictability for hard real-time applications. Hence, prior techniques focus on employing cache locking to improve the worst-case execution time. However, cache locking can be quite effective in improving the average-case execution time of general embedded applications as well. In this paper, we explore static instruction cache locking to improve the average-case program performance. We introduce temporal reuse profile (TRP) to accurately and efficiently model the cost and benefit of locking memory blocks in the cache. We consider two locking mechanisms, line locking and way locking. For each locking mechanism, we propose a branch-and-bound algorithm and a heuristic approach that use the TRP to determine the most beneficial memory blocks to be locked in the cache. Experimental results show that the heuristic approach achieves close to the results of branch-and-bound algorithm and can improve the performance by 12% on average for 4 KB cache across a suite of real-world benchmarks. Moreover, our heuristic provides significant improvement compared to the state-of-the-art locking algorithm both in terms of performance and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call