To improve the energy efficiency of computation, accelerators trade off performance and energy consumption for flexibility. Fixed-function accelerators reach high energy efficiency, but are inflexible. Adding programmability via an instruction set architecture (ISA) incurs an energy consumption overhead, as instructions are fetched and decoded. To reduce it, hardware-controlled instruction caches and software-controlled components, such as loop buffers and (programmable) dictionaries improve the energy efficiency of instruction streams in embedded processors. Reducing the instruction overhead with code compression is well established and dictionary compression has been an effective approach due to its simplicity. Compared to static dictionaries, adding programmability improves the effectiveness. However, run-time-programmable dictionary compression and its effect on energy consumption has not been thoroughly studied. We describe a scheme to target energy efficiency by using fine-grained programmable dictionaries in embedded compute devices. Guided by compile-time analysis, the dictionary contents are changed during execution. On CHStone and Embench suites, our method reduces energy consumption on average by 11.4% and 3.8% with negligible run-time overhead. The addition of a loop buffer further reduces the energy consumption by 19.8% and 4.5% in the two suites. Our results indicate that programmable dictionary compression allows further energy reductions over an already highly tuned instruction stream.
Read full abstract