Real-time audio processing on a raspberry Pi using deep neural networks

Fotios Drakopoulos ,Deepak Baby ,Sarah Verhulst

doi:10.18154/rwth-conv-239335

Abstract

Over the past years, deep neural networks (DNNs) have quickly grown into the state-of-the-art technologyfor various machine learning tasks such as image and speech recognition or natural language processing.However, as DNN-based applications typically require significant amounts of computation, running DNNson resource-constrained devices still constitutes a challenge, especially for real-time applications such aslow-latency audio processing. In this paper, we aimed to perform real-time noise suppression on a low-costembedded platform with limited resources, using a pre-trained DNN-based speech enhancement model. Aportable setup was employed, consisting of a Raspberry Pi 3 Model B+ fitted with a soundcard and head-phones. A (basic) low-latency Python framework was developed to accommodate an audio processing al-gorithm operating in a real-time environment. Various layouts and trainable parameters of the DNN-basedmodel as well as different processing time intervals (from 64 up to 8 ms) were tested and compared usingobjective metrics (e.g. PESQ, segSNR) to achieve the best possible trade-off between noise suppressionperformance and audio latency. We show that 10-layer DNNs with up to 350,000 trainable parameters cansuccessfully be implemented on the Raspberry Pi 3 Model B+ and yield latencies below 16-ms for real-timeaudio applications.

Full Text