Building a Japanese MIDI-to-Singing song synthesis using an English male voice

Hung-Che Shen,Cheng-Chi Wang

doi:10.1051/matecconf/201820102006

Abstract

This work reports development of a MIDI-to-Singing song synthesis that will produce audio files from MIDI data and arbitrary Romaji lyrics in Japanese. The MIDI-to-Singing system relies on the Flinger (Festival singer) for singing voice synthesis. Originally, this MIDI-to-Singing system was developed by English. Based on some Japanese pronunciation rules, a Japanese MIDI-to-Sing synthesis system was developed and derived. For a language transfer like Festival synthesized singing, two major tasks are the modifications of a phoneset and a lexicon. Originally, MIDI-to-Sing song synthesis can create singing voices in many languages, but there is no existing Japanese festival diphone voice available right now. We therefore used a voice transformation model in festival to develop Japanese MIDI-to-Singing synthesis. An evaluation of a song listening experiment was conducted and the result of this voice conversion showed that the synthesized singing voice successfully migrate from English to Japanese with high voice quality.

Highlights

The goal of this research is to synthesize natural singing Japanese song from an English Text-to-Speech voice
With the term MIDI-to-Singing, we mean the production of humanlike singing voice based on a given MIDI format music
The MIDI-to-Singing system is an extension from a speech-to-singing synthesis, which converts a speaking voice reading the lyrics of a song to a singing voice given its musical score

Summary

Introduction

The goal of this research is to synthesize natural singing Japanese song from an English Text-to-Speech voice. The MIDI-to-Singing system is an extension from a speech-to-singing synthesis, which converts a speaking voice reading the lyrics of a song to a singing voice given its musical score. Without a score editor environment for end-users, it should be noted that a concatenation-based singing synthesizer was already proposed by Macon et al [3] in 1992. They released open source for this method that is called Flinger [4]. It is written by Mike Macon based on the Festival Speech Synthesis System [5], developed at the University of Edinburgh.

Derive a Japanese diphone voice

Modifying a lexicon

A MIDI-to-Singing song synthesis

Score editor

Discussions

Conclusions