Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

Nathan Keil,Jeremy Stewart,David A. Dahlbom,Mary Simoni,Michael Perrone,Curtis Bahn,Matthew Goodheart,Jonas Braasch

doi:10.1121/1.5101527

Abstract

This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the auralized music performances. A polyphonic pitch model based on an array of autocorrelation functions for individual frequency bands is used to train a neural network and analyze the data [Work supported by IBM AIRC grant and NSF BCS-1539276.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Journal: The Journal of the Acoustical Society of America	Publication Date: Mar 1, 2019
Citations: 1

Similar Papers

Identification framework for cracks on a steel structure surface by a restricted Boltzmann machines algorithm based on consumer-grade camera images
Yang Xu ... Fujian Zhang
Structural Control and Health Monitoring | VOL. 25
Yang Xu, et. al.Yang Xu ... Fujian Zhang
04 Aug 2017
Structural Control and Health Monitoring | VOL. 25

Biological batch normalisation: How intrinsic plasticity improves learning in deep neural networks
Jeff Orchard ... Nolan Peter Shaw
-
Jeff Orchard, et. al.Jeff Orchard ... Nolan Peter Shaw
23 Sep 2020
23 Sep 2020

Biological batch normalisation: How intrinsic plasticity improves learning in deep neural networks.
Nolan Peter Shaw ... Tyler Jackson
PLOS ONE | VOL. 15
Nolan Peter Shaw, et. al.Nolan Peter Shaw ... Tyler Jackson
23 Sep 2020
PLOS ONE | VOL. 15

Research on Rolling Bearing Fault Diagnosis Based on DRS Frequency Spectrum Image and Deep Learning
Zhuoxian Li ... Hao Wang
The International Journal of Acoustics and Vibration | VOL. 28
Zhuoxian Li, et. al.Zhuoxian Li ... Hao Wang
16 Jun 2023
The International Journal of Acoustics and Vibration | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America