Abstract

We propose an end-to-end acoustic scene analysis framework with distributed sound-to-light conversion devices called Blinkies. Blinkies transmit sound information as the intensity of an on-board light-emitting diode (LED). A video camera can then easily collect acoustic information by capturing the LED intensities from multiple Blinkies distributed over a large area. However, the transmitted signal is band-limited owing to a video camera's frame rate, typically 30 frames per second. We aim to optimize the sound-to-light conversion process for acoustic scene analysis under this bandwidth constraint. In light-signal propagation in air, signal degradation due to physical constraints such as light attenuation and noise will also occur. We model the physical constraints as differentiable physical layers, which enable us to train two deep neural networks (DNNs) for sound-to-light conversion and acoustic scene analysis in an end-to-end manner. Simulation experiments of acoustic scene analysis using a DCASE 2018 dataset show that the proposed framework can produce a higher accuracy than the previous framework with Blinkies. This result suggests the suitability of Blinkies for acoustic scene analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call