Abstract

A design of the parametric models estimating a quality of synthesized speech transmitted through IP networks is presented in this paper. A Genetic Programming and Random Neural Network as machine learning techniques were deployed to design the models. A set of the quality-affecting parameters was used as an input to the designed parametric estimation models in order to estimate a quality of synthesized speech transmitted over IP networks (VoIP environment). The performance results obtained for the designed parametric estimation models have validated both genetic programming and random neural network as powerful techniques, delivering good accuracy and generalization ability; this makes them perspective candidates for quality estimation of this type of speech in the corresponding environment. The developed parametric models can be helpful for network operators and service providers in a planning phase or early-development stage of telecommunication services based on synthesized speech.

Highlights

  • A speech quality assessment process is useful for network operators and service providers to evaluate the quality of voice services offered by current telecommunication networks

  • We presented the novel parametric models for a non-intrusive estimation of the speech quality based on biologically inspired machine learning techniques, like a Genetic Programming (GP) and Random Neural Network (RNN)

  • It is worth reiterating that the designed parametric models estimate the quality of synthesized speech transmitted over IP networks (VoIP environment)

Read more

Summary

Introduction

A speech quality assessment process is useful for network operators and service providers to evaluate the quality of voice services offered by current telecommunication networks. Subjective testing is based on a large enough group of human subjects, who listen to given samples and assign an opinion score on a scale ranging from 1 “bad quality” to 5 “excellent quality” (i.e. MOS (Mean Opinion Scale) scale). This approach is impractical in real conditions, because of the number of subjects, that have to participate in a test, time-consumption, high costs, etc. The intrusive models (e.g. ITU-T PESQ) are characterized by comparing two types of signals They evaluate the quality of a degraded (output) speech signal by comparing it with a corresponding reference (input) speech signal. Parametric non-intrusive models are based on estimating the quality of speech transmission using input parameters characterizing this transmission from a quality point of view [1] and [2]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call