Abstract
A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within TigerMIPS processor. The modified system is experimentally tested at the gate level in 45 nm CMOS technology, exhibiting simultaneously a speedup of 70 percent and an energy reduction of 30 percent with coarse-grained ML classification as compared with the baseline TigerMIPS. A speedup of 89 percent is demonstrated with finer granularities with a simultaneous 15.5 percent reduction in energy consumption.
Highlights
THE primary design goal in computer architecture is to maximize the performance of a system under power, area, temperature, and other application-specific constraints
The clock frequency of the baseline processor is set to 250 MHz, as determined based on the worst-case propagation delay reported by Synopsis Design Compiler
Classification of instructions into delay intervals in real time alleviates the path propagation variances imposed by PVT variations and system aging
Summary
Arash Fouman Ajirlou , Student Member, IEEE and Inna Partin-Vaisband , Member, IEEE. Abstract—A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within TigerMIPS processor. The modified system is experimentally tested at the gate level in 45 nm CMOS technology, exhibiting simultaneously a speedup of 70 percent and an energy reduction of 30 percent with coarse-grained ML classification as compared with the baseline TigerMIPS. A speedup of 89 percent is demonstrated with finer granularities with a simultaneous 15.5 percent reduction in energy consumption
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have