Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Jurgen Vandendriessche,Abdellah Touhafi,Mohamed Yassin Chkouri,Nick Wouters,Mimoun Lamrini,Bruno Da Silva

doi:10.3390/electronics10212622

Jurgen Vandendriessche, Abdellah Touhafi + Show 4 more

Open Access

https://doi.org/10.3390/electronics10212622

Copy DOI

Abstract

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Highlights

Published: 27 October 2021Environmental Sound Recognition (ESR), especially in urban environments, is becoming a relevant feature for many applications, from monitoring traffic in smart cities [1]and monitoring criminal activity [2], to noise disturbances monitoring in residential or natural areas [3,4], such as the noise produced by overflying airplanes taking off or landing at a nearby airport [5]
To place the performance on the Field-Programmable Gate Arrays (FPGAs) into context, the Zynq Z-7020 with CNN1D is compared with the Raspberry Pi (RPi) 4B and the Coral DevBoard + Tensor Processing Units (TPUs) in Tables 8–11 for all datasets
The solution generated using hls4ml for the Zynq Z-7020 FPGA is faster than any solution on other devices (PTQ 8-bit model on a RPi 4B), while maintaining a high accuracy

Summary

Introduction

Published: 27 October 2021Environmental Sound Recognition (ESR), especially in urban environments, is becoming a relevant feature for many applications, from monitoring traffic in smart cities [1]and monitoring criminal activity [2], to noise disturbances monitoring in residential or natural areas [3,4], such as the noise produced by overflying airplanes taking off or landing at a nearby airport [5]. Environmental Sound Recognition (ESR), especially in urban environments, is becoming a relevant feature for many applications, from monitoring traffic in smart cities [1]. Embedded platforms are often used toward the deployment of intelligent sound monitoring devices. Several methods have been explored in the past to accurately classify urban sounds using constrained embedded devices. Recent advances in machine learning show that Deep Neural Networks (DNN), and Convolutional Neural Networks (CNNs), provide high accuracy for sound recognition [7]. They often demand computationally intensive operations, with the consequence of having a limited accuracy and slow response time when ported to embedded devices

Objectives

Findings

Discussion

Conclusion