Abstract
Transcription from audio to musical representation is a challenging problem for Query by Humming (QBH) systems. In this paper, we propose a two step note transcription process consisting of an algorithm that uses a speech recognizer for note segmentation followed by signal processing for robust location and capture of pitch and duration in the humming audio input. In contrast to most Hidden Markov Model based approaches to QBH systems that model and classify humming into a single universal model, we designed a flexible speech recognizer that allows different types of humming sounds in the input for providing efficient and accurate note segmentation and transcription. We use novel statistical energy and pitch analyses to correct potential insertion and deletion errors to augment the system's performance, and evaluate our algorithm with precision and recall tests. Using a large database previously amassed, we test various system configurations, providing results for note segmentation with and without the proposed augmentations. The augmented system robustly recognizes the location of humming notes with a precision and recall F measure of 0.84. As a second validation, we use the results of the transcription system in melody retrieval and found, for a database of 1000 melodies, a 76% retrieval accuracy with automatically extracted queries, and a 83% retrieval performance with manually transcribed queries. Sensitivity analysis shows that, while it is possible to locate the position of the hummed notes accurately, incorrect segmentation results can have a negative effect in the retrieval performance of the QBH system.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have