Energy-aware very fast decision tree

Eva García-Martín,Emiliano Casalicchio,Håkan Grahn,Veselka Boeva,Niklas Lavesson

doi:10.1007/s41060-021-00246-4

Abstract

Recently machine learning researchers are designing algorithms that can run in embedded and mobile devices, which introduces additional constraints compared to traditional algorithm design approaches. One of these constraints is energy consumption, which directly translates to battery capacity for these devices. Streaming algorithms, such as the Very Fast Decision Tree (VFDT), are designed to run in such devices due to their high velocity and low memory requirements. However, they have not been designed with an energy efficiency focus. This paper addresses this challenge by presenting the nmin adaptation method, which reduces the energy consumption of the VFDT algorithm with only minor effects on accuracy. nmin adaptation allows the algorithm to grow faster in those branches where there is more confidence to create a split, and delays the split on the less confident branches. This removes unnecessary computations related to checking for splits but maintains similar levels of accuracy. We have conducted extensive experiments on 29 public datasets, showing that the VFDT with nmin adaptation consumes up to 31% less energy than the original VFDT, and up to 96% less energy than the CVFDT (VFDT adapted for concept drift scenarios), trading off up to 1.7 percent of accuracy.

Highlights

State-of-the-art machine learning algorithms are being designed to run in the edge, which creates new time, memory, and energy requirements
We have evaluated the accuracy as the percentage of correctly classified instances and the energy consumption as the energy estimated by the Intel Power Gadget tool, summing the energy consumed by the processor and the DRAM to obtain the total energy consumption
This paper introduced nmin adaptation, a method that extends standard Hoeffding trees to reduce their energy consumption. nmin adaptation allows for a faster growth on the branches with higher confidence to split and delays the growth on the less confident branches

Summary

Introduction

State-of-the-art machine learning algorithms are being designed to run in the edge, which creates new time, memory, and energy requirements. Streaming algorithms fulfill the time and memory requirements by building models in real-time, processing data with high velocity and low memory consumption. Energy consumption has not been considered during the design of the VFDT and state-of-theart streaming algorithms. The nmin parameter sets the minimum number of observed instances (batch size) at each leaf to check for a possible split. To update the statistics the algorithm maintains a table for each node, with the observed attribute and class values. After nmin instances are read at that leaf, the algorithm calculates the information gain (G) from all observed attributes. If G < < τ , a tie occurs, splitting on any of the two top attributes, since they have very similar information gain values.

Objectives

Results

Conclusion