Abstract

This work reports development of a MIDI-to-Singing song synthesis that will produce audio files from MIDI data and arbitrary Romaji lyrics in Japanese. The MIDI-to-Singing system relies on the Flinger (Festival singer) for singing voice synthesis. Originally, this MIDI-to-Singing system was developed by English. Based on some Japanese pronunciation rules, a Japanese MIDI-to-Sing synthesis system was developed and derived. For a language transfer like Festival synthesized singing, two major tasks are the modifications of a phoneset and a lexicon. Originally, MIDI-to-Sing song synthesis can create singing voices in many languages, but there is no existing Japanese festival diphone voice available right now. We therefore used a voice transformation model in festival to develop Japanese MIDI-to-Singing synthesis. An evaluation of a song listening experiment was conducted and the result of this voice conversion showed that the synthesized singing voice successfully migrate from English to Japanese with high voice quality.

Highlights

  • The goal of this research is to synthesize natural singing Japanese song from an English Text-to-Speech voice

  • With the term MIDI-to-Singing, we mean the production of humanlike singing voice based on a given MIDI format music

  • The MIDI-to-Singing system is an extension from a speech-to-singing synthesis, which converts a speaking voice reading the lyrics of a song to a singing voice given its musical score

Read more

Summary

Introduction

The goal of this research is to synthesize natural singing Japanese song from an English Text-to-Speech voice. The MIDI-to-Singing system is an extension from a speech-to-singing synthesis, which converts a speaking voice reading the lyrics of a song to a singing voice given its musical score. Without a score editor environment for end-users, it should be noted that a concatenation-based singing synthesizer was already proposed by Macon et al [3] in 1992. They released open source for this method that is called Flinger [4]. It is written by Mike Macon based on the Festival Speech Synthesis System [5], developed at the University of Edinburgh.

Derive a Japanese diphone voice
Modifying a lexicon
A MIDI-to-Singing song synthesis
Score editor
Discussions
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call