Abstract

This paper presents an integrated circuit (IC) realization of a random forest (RF) machine learning classifier in a 65-nm CMOS. Algorithm, architecture, and circuits are co-optimized to achieve aggressive energy and delay benefits by taking advantage of the inherent error resiliency derived from the ensemble nature of an RF classifier. Deterministic sub-sampling (DSS) and regularized decision trees reduce interconnect complexity, and avoid irregular memory access patterns and computations, thereby reducing the energy-delay product (EDP). The prototype IC also employs low-swing analog in-memory computations embedded in a standard 6T SRAM to enable massively parallel tree node comparisons, thereby minimizing the memory fetches and reducing the EDP further. The 65-nm CMOS prototype IC achieves a $3.1{\times }$ and $2.2{\times }$ improved energy efficiency and throughput leading to $6.8{\times }$ lower EDP compared to a conventional digital system at the same accuracies of 94% and 97.5% for two tasks: 1) eight-class traffic sign recognition and 2) face detection, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call