Abstract

Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, these models are difficult to optimize for fast inference at scale without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. Recently, we proposed a novel analog content addressable memory (CAM) based on emerging memristor devices for fast look-up table operations. Here, we propose for the first time to use the analog CAM as an in-memory computational primitive to accelerate tree-based model inference. We demonstrate an efficient mapping algorithm leveraging the new analog CAM capabilities such that each root to leaf path of a Decision Tree is programmed into a row. This new in-memory compute concept for enables few-cycle model inference, dramatically increasing 103 × the throughput over conventional approaches.

Highlights

  • Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN)

  • Other traditional CMOS accelerator approaches have been studied for these models, such as an random forest (RF) in-memory computing (IMC) accelerator based on complementary-metal-oxide-semiconductor (CMOS) static random access memories (SRAM) in ref. 27, but model inference at high throughput and low energy operation remains a challenge

  • With analog content addressable memory (CAM) hardware, the highly irregular memory lookup patterns of tree-based machine learning models can be accelerated with IMC architectures, due to the analog CAM capability to store ranges of values and search analog data

Read more

Summary

Introduction

Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN) These models are difficult to optimize for fast inference at scale without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. DNNs are unsuitable for multiple government[2] and industry[3] applications where inspectability and explainability are critical, training data may be limited, or where domain knowledge and historical expertise needs to be incorporated in critical decisions These applications include those in the medical space[4,5] where fast and accurate clinical assessments of a disease are critical as well as a deep understanding of the cause or reasons for a specific model classification result in order to rapidly prepare treatments. Other traditional CMOS accelerator approaches have been studied for these models, such as an RF IMC accelerator based on complementary-metal-oxide-semiconductor (CMOS) static random access memories (SRAM) in ref. 27, but model inference at high throughput and low energy operation remains a challenge

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call