Vowel duration dependent hidden Markov model for automatic lyrics recognition

Shohei Awata,Shinji Sako,Tadashi Kitamura

doi:10.1121/1.4971035

Abstract

Recently, due to the spread of music distribution service, a large amount of music is available on the Internet. Accordingly, it is generally increasing the demand of music information retrieval (MIR). In the field of MIR research, there are several researches to extract meaningful information from music audio signals. However, automatic lyrics recognition is still a challenging problem because the variation of singing voice is much larger than that of speaking voice and a large database of singing voice is not available. In the relevant study, lyrics recognition was performed by extending the framework of speech recognition using hidden Markov model (HMM). However, accuracy rate was not sufficient. To recognize singing voice precisely, one promising approach is utilizing musical features. This study considers the task of recognizing syllable from a cappella singing voice. To respond to the variation of the length of a phoneme, we construct the duration dependent HMM. A large database of singing voice is essential for training the acoustic model. We use synthetic singing voice by HMM based singing voice synthesis system to solve the lack of the database of a cappella singing voice. We confirmed the effectiveness of our method.

Full Text