Abstract

A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within TigerMIPS processor. The modified system is experimentally tested at the gate level in 45 nm CMOS technology, exhibiting simultaneously a speedup of 70 percent and an energy reduction of 30 percent with coarse-grained ML classification as compared with the baseline TigerMIPS. A speedup of 89 percent is demonstrated with finer granularities with a simultaneous 15.5 percent reduction in energy consumption.

Highlights

  • THE primary design goal in computer architecture is to maximize the performance of a system under power, area, temperature, and other application-specific constraints

  • The clock frequency of the baseline processor is set to 250 MHz, as determined based on the worst-case propagation delay reported by Synopsis Design Compiler

  • Classification of instructions into delay intervals in real time alleviates the path propagation variances imposed by PVT variations and system aging

Read more

Summary

A Machine Learning Pipeline Stage for Adaptive Frequency Adjustment

Arash Fouman Ajirlou , Student Member, IEEE and Inna Partin-Vaisband , Member, IEEE. Abstract—A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within TigerMIPS processor. The modified system is experimentally tested at the gate level in 45 nm CMOS technology, exhibiting simultaneously a speedup of 70 percent and an energy reduction of 30 percent with coarse-grained ML classification as compared with the baseline TigerMIPS. A speedup of 89 percent is demonstrated with finer granularities with a simultaneous 15.5 percent reduction in energy consumption

INTRODUCTION
PRIOR AND RELATED WORK
THE PROPOSED ML-BASED FREQUENCY ADJUSTMENT
Phase 1
Phase 2
Phase 3
MACHINE LEARNING MODELS
Random Forest
ML Algorithm Tradeoffs
Baseline Processor
IMPLEMENTATION
EXPERIMENTAL RESULTS
CONCLUSION AND FUTURE WORK
SUMMARY
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call