Abstract

Automatic Text-Independent Artifact Detection, Localization, and Classification in the Synthetic Speech

Highlights

  • The synthetic speech produced by text-to-speech (TTS) systems is increasingly used to make dialogue management in human-machine interaction more effective

  • Three basic comparison experiments were performed within the research described in this paper: the first one is the verification of functionality of the analysis of variances (ANOVA)-based artifact detector using the synthetic speech produced by the Czech TTS system

  • Only one male and one female voice are implemented in the tested Czech TTS system working with the unit selection (USEL)-based synthesis method [2, 6, 7]

Read more

Summary

Introduction

The synthetic speech produced by text-to-speech (TTS) systems is increasingly used to make dialogue management in human-machine interaction more effective. The most widely used one is the corpus-based speech synthesis using the unit selection (USEL) [1], i.e. selection of the largest suitable segments from the natural speech according to various phonetic, prosodic, and positional criteria, commonly known as the target cost. The automatic artifact detection, localization, and classification can help in the whole process of the TTS system creation It holds especially for the artifacts caused by wrong annotation or those found in the already generated synthetic sentence. If their location is known, they can be eliminated in the post-processing or directly during the unit selection as a component part of the concatenation cost

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call