Abstract

Voice-to-MIDI real-time conversion is a challenging task that presents a series of obstacles and complications. The main issue is the tracking of the pitch. The frequency tracking of human voice can be inaccurate and computationally expensive due to spectral complexity of voice sounds. Moreover, with microphone-based systems, the presence of environmental noise and neighbouring sounds further affect the accuracy of the frequency tracking. Another issue with the conversion of voice into MIDI, is the presence of non-singing phonemes. As every sound picked up by the microphone would go through the conversion system, any voice or sounded phonemes produced by the user will result in a MIDI output. This research addresses such issues by applying a novel experimental method which employs electroglottography, known to the medical community as EGG, as a source for the pitch tracking operation. Electroglottography improves both the accuracy of the tracking and the ease of processing as it delivers a direct evaluation of the vocal folds operation whilst bypassing any contamination from other sound sources. Furthermore, to address the issue of non-singing phonemes, the proposed method employs the use of neural networks for a real-time classification of the vocal act produced by the user.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call