Abstract

We propose an end-to-end acoustic scene analysis framework with distributed sound-to-light conversion devices called Blinkies. Blinkies transmit sound information as the intensity of an on-board light-emitting diode (LED). A video camera can then easily collect acoustic information by capturing the LED intensities from multiple Blinkies distributed over a large area. However, the transmitted signal is band-limited owing to a video camera's frame rate, typically 30 frames per second. We aim to optimize the sound-to-light conversion process for acoustic scene analysis under this bandwidth constraint. In light-signal propagation in air, signal degradation due to physical constraints such as light attenuation and noise will also occur. We model the physical constraints as differentiable physical layers, which enable us to train two deep neural networks (DNNs) for sound-to-light conversion and acoustic scene analysis in an end-to-end manner. Simulation experiments of acoustic scene analysis using a DCASE 2018 dataset show that the proposed framework can produce a higher accuracy than the previous framework with Blinkies. This result suggests the suitability of Blinkies for acoustic scene analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.