Acoustic perception of the automotive environment has the potential to advance driving potentials with enhanced safety. The challenge arises when these acoustic perception systems need to perform under resource and power constraints on edge devices. Neuromorphic computing has introduced spiking neural networks in the context of ultra-low power sensory edge devices. Spiking architectures leverage biological plausibility to achieve computational capabilities, accurate performance, and great compatibility with neuromorphic hardware. In this work, we explore the depths of spiking neurons and feature components with the acoustic scene analysis task for siren sounds. This research work aims to address the qualitative analysis of sliding windows’ variation on the feature extraction front of the preprocessing pipeline. Optimization of the parameters to exploit the feature extraction stage facilitates the advancement of the performance of the acoustics anomaly detection task. We exploit the parameters for mel spectrogram features and FFT calculations, prone to be suitable for computations in hardware. We conduct experiments with different window sizes and the overlapping ratio within the windows. We present our results for performance measures like accuracy and onset latency to provide an insight on the choice of optimal window. The non-trivial motivation of this research is to understand the effect of encoding behavior of spiking neurons with different windows. We further investigate the heterogeneous nature of membrane and synaptic time constants and their impact on the accuracy of anomaly detection. On a large scale audio dataset comprising of siren sounds and road traffic noises, we obtain accurate predictions of siren sounds using a recurrent spiking neural network. The baseline dataset comprising siren and noise sequences is enriched with a bird dataset to evaluate the model with unseen samples.
Read full abstract