Use of DP matching for evaluating synthesized speech

Takashi Saito,Toyohisa Kaneko,Yasuhiro Matsuda

doi:10.1121/1.2023055

Abstract

We examined the applicability of DP matching as a tool for evaluating methods for connecting synthesis units. In speech synthesis by rule, speech quality depends greatly on how to connect synthesis units (e.g., syllables). The most widely used method is to use human ears. Although it is justifiable as the final judgment, it is time‐consuming as well as fairly listener‐dependent. The method proposed here is to use the amount of residual errors of DP matching, which has been widely known in speech recognition. When we confront the problem of selecting parameters of a particular connecting method, we record words naturally spoken (presumably continuous) and then operate DP matching between the natural speech and synthetic speech. The parameter set that produces the least amount of residual error is selected. We devised DP matching particularly suited to this purpose. We found that the optimized parameters with this method agreed well with those obtained with human ears and believe that this proposed automated method is applicable as a substitute of human ears for sizable classes of evaluation problems.

Full Text