Abstract
This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the auralized music performances. A polyphonic pitch model based on an array of autocorrelation functions for individual frequency bands is used to train a neural network and analyze the data [Work supported by IBM AIRC grant and NSF BCS-1539276.]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.