Abstract

General purpose architectures are designed to offer average high performance regardless of the particular application that is being run. Performance and power inefficiencies appear as a consequence for some programs. Reconfigurable hardware (cache hierarchy, branch predictor, execution units, bandwidth, etc.) has been proposed to overcome these inefficiencies by dynamically adapting the architecture to the application needs. However, nearly all the proposals use indirect measures or heuristics of performance to decide new configurations, what may lead to inefficiencies. In this paper we propose a runtime mechanism that allows to predict the throughput of an application on an architecture using a reconfigurable L2 cache. L2 cache size varies at a way granularity and we predict the performance of the same application on all other L2 cache sizes at the same time. We obtain for different L2 cache sizes an average error of 3.11%, a maximum error of 16.4% and standard deviation of 3.7%. No profiling or operating system participation is needed in this mechanism. We also give a hardware implementation that allows to reduce the hardware cost under 0.4% of the total L2 size and maintains high accuracy. This mechanism can be used to reduce power consumption in single threaded architectures and improve performance in multithreaded architectures that dynamically partition shared L2 caches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call