Abstract

The estimation of fundamental frequency, or pitch, is a fundamental task in computational audio analysis with a variety of applications. Steelpan audio has proven difficult for general pitch detection methods CRéPE and pYin. CRéPE is a method that uses a deep convolutional neural network to perform pitch estimation directly from the audio signal while pYIN is a digital signal processing-based approach. Audio feature extraction is the process of using digital signal processing techniques to extract low level audio information from signals. Combining audio feature extraction with logistic regression is currently the best performing steelpan pitch estimation method, but the efficacy of using deep neural networks in lieu of traditional machine learning algorithms has yet to be determined. This paper compares the performance and computational requirements of a deep neural network-based architecture against logistic regression as well as the established pYIN and CRéPE pitch detection methods to determine which method is the most accurate and efficient. All of these methods are evaluated on a test dataset containing one-hit audio samples from several distinct sounding steelpans. Generalization to other steelpans is assessed by including samples in the test dataset from steelpans for which no samples are in the training dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call