Abstract

Estimating the pitch, or fundamental frequency, of a monophonic audio signal is a fundamental task in computational audio analysis with many downstream applications such as automatic transcription. The current general state of the art method for pitch detection is CRéPE: a Convolutional Representation for Pitch Estimation. CRéPE is a deep convolutional neural network designed to estimate the fundamental frequency of an audio signal directly from the waveform. However, CRéPE, and other general pitch detection methods, do not perform well on steelpan audio. This is likely due to the steelpan's complex spectral characteristics that differentiate it timbrally from other sound sources. We combine a deep convolutional neural network architecture based on CRéPE with a training dataset of steelpan audio from several distinct sounding tenor steelpans to achieve improved tenor steelpan pitch detection directly from the audio signal. We assess our model's ability to generalize by evaluating it with a test dataset that includes audio samples from steelpans that have no samples as part of the training set and compare these results to CRéPE's performance on the same test dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call