Abstract

In Internet of Things (IoT) scenarios, it is challenging to deploy Machine Learning (ML) algorithms on low-cost Field Programmable Gate Arrays (FPGAs) in a real-time, cost-efficient, and high-performance way. This paper introduces Machine Learning on FPGA (MLoF), a series of ML IP cores implemented on the low-cost FPGA platforms, aiming at helping more IoT developers to achieve comprehensive performance in various tasks. With Verilog, we deploy and accelerate Artificial Neural Networks (ANNs), Decision Trees (DTs), K-Nearest Neighbors (k-NNs), and Support Vector Machines (SVMs) on 10 different FPGA development boards from seven producers. Additionally, we analyze and evaluate our design with six datasets, and compare the best-performing FPGAs with traditional SoC-based systems including NVIDIA Jetson Nano, Raspberry Pi 3B+, and STM32L476 Nucle. The results show that Lattice’s ICE40UP5 achieves the best overall performance with low power consumption, on which MLoF averagely reduces power by 891% and increases performance by 9 times. Moreover, its cost, power, Latency Production (CPLP) outperforms SoC-based systems by 25 times, which demonstrates the significance of MLoF in endpoint deployment of ML algorithms. Furthermore, we make all of the code open-source in order to promote future research.

Highlights

  • Machine Learning (ML) algorithms are effective and efficient in processing Internet of Things (IoT) endpoint data with well robustness [1]

  • We introduce Machine Learning on Field Programmable Gate Arrays (FPGAs) (MLoF) with a series of Intellectual Property (IP) cores dedicated to low-cost FPGAs

  • Compared with the typical way of implementing machine learning algorithms on embedded systems, including NVIDIA Jetson Nano, Raspberry Pi 3B+, and STM32L476 Nucle, the advantage of MLoF is that it balances the cost, performance, and power consumption

Read more

Summary

Introduction

Machine Learning (ML) algorithms are effective and efficient in processing Internet of Things (IoT) endpoint data with well robustness [1]. The TensorFlow Lite [3], X-CUBE-AI [4], and the Cortex Microcontroller Software Standard Neural Network (CMSISNN) [5] are three frameworks proposed by Google, STM, and ARM for pre-trained models in embedded systems These solutions cannot achieve a balance among power consumption, cost-efficiency, and high-performance simultaneously for IoT endpoint ML implementations. Compared with the typical way of implementing machine learning algorithms on embedded systems, including NVIDIA Jetson Nano, Raspberry Pi 3B+, and STM32L476 Nucle, the advantage of MLoF is that it balances the cost, performance, and power consumption These IP cores are open-source, assisting developers and researchers in more efficient implementation of machine learning algorithms on their endpoint devices.

Related Work
Machine Learning Algorithms Implementation on Low-Cost FPGAs
Activation Function
92 Kb 512 Kb 85 Kb 1800 Kb
Experimental Analysis and Result
Implementation and Analysis of ANN Hardware
Implementation and Analysis of DT Hardware
Implementation and Analysis of k-NN
Findings
Comparisons with Embedded Platforms
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call