Proposed CNN Model for Audio Recognition on Embedded Device

Minh Pham Ngoc,Tan Ngo Duy,Hoan Huynh Duc,Kiet Tran Anh

doi:10.3991/ijim.v18i08.45917

Minh Pham Ngoc, Tan Ngo Duy + Show 2 more

Open Access

https://doi.org/10.3991/ijim.v18i08.45917

Copy DOI

Abstract

The audio detection system enables autonomous cars to recognize their surroundings based on the noise produced by moving vehicles. This paper proposes the utilization of a machine learning model based on convolutional neural networks (CNN) integrated into an embedded system supported by a microphone. The system includes a specialized microphone and a main processor. The microphone enables the transmission of an accurate analog signal to the main processor, which then analyzes the recorded signal and provides a prediction in return. While designing an adequate hardware system is a crucial task that directly impacts the predictive capability of the system, it is equally imperative to train a CNN model with high accuracy. To achieve this goal, a dataset containing over 3000 up-to-5-second WAV files for four classes was obtained from open-source research. The dataset is then divided into training, validation, and testing sets. The training data is converted into images using the spectrogram technique before training the CNN. Finally, the generated model is tested on the testing segment, resulting in a model accuracy of 77.54%.

Full Text