Marginal Effects of Language and Individual Raters on Speech Quality Models

Michael Chinen

doi:10.1109/access.2021.3112165

Abstract

Speech quality is often measured via subjective testing, or with objective estimators of mean opinion score (MOS) such as ViSQOL or POLQA. Typical MOS-estimation frameworks use signal level features but do not use language features that have been shown to have an effect on opinion scores. If there is a conditional dependence between score and language given these signal features, introducing language and rater predictors should provide a marginal improvement in predictions. The proposed method uses Bayesian models that predict the individual opinion score instead of MOS. Several models that test various combinations of predictors were used, including predictors that capture signal features, such as frequency band similarity, as well as features that are related to the listener, such as a language and rater index. The models are fit to the ITU-T P. Supplement 23 dataset, and posterior samples are drawn from distributions of both the model parameters and the resulting opinion score outcomes. These models are compared to MOS models by integrating over posterior samples per utterance. An experiment was conducted by ablating different predictors for several types of Bayesian hierarchical models (including ordered logistic and truncated normal individual score distributions, as well as MOS distributions) to find the marginal improvement of language and rater. The models that included language and/or rater obtained significantly lower errors (0.601 versus 0.684 root-mean-square error (RMSE)) and higher correlation. Additionally, individual rater models matched or exceeded the performance of MOS models.

Highlights

M EASURING and estimating speech quality is an important task for many fields
We propose to use a Bayesian hierarchical model of an ordered categorical distribution to model individual opinion scores based on speech, listener, and language features
When subjective tests are performed, it is common to see the results reported with the sample mean μspecified along with a 95% or 99% symmetric confidence interval [μ − cσ, μ + cσ] where σis the sample standard deviation of the opinion score and c is a constant

Summary

Introduction

M EASURING and estimating speech quality is an important task for many fields. In speech synthesis and coding [1], subjective measurements of quality can be used to validate novel designs, and may be especially useful when traditional objective metrics like SNR diverge from human perception. The absolute categorical ranking (ACR) test asks raters to measure the quality of speech utterances under various test conditions by assigning a score from from 1 (bad) to 5 (excellent), with recommendations for conducting the test in ITU P. The mean opinion score (MOS) can be calculated by aggregating the scores over each utterance or all the utterances within a given condition. MOS is a standard measurement that is used in research and development of many speech applications such as codecs and speech enhancement [3]

Objectives

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Marginal Effects of Language and Individual Raters on Speech Quality Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Towards a Non-Intrusive Context-Aware Speech Quality Model
Rahul Jaiswal ... Andrew Hines
-
Rahul Jaiswal, et. al.Rahul Jaiswal ... Andrew Hines
01 Jun 2020
01 Jun 2020

Decision letter: Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys
Andrew Azman ... Miles P Davenport
-
Andrew Azman, et. al.Andrew Azman ... Miles P Davenport
17 Dec 2020
17 Dec 2020

Author response: Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys
Daniel B Larremore ... Yonatan H Grad
-
Daniel B Larremore, et. al.Daniel B Larremore ... Yonatan H Grad
16 Feb 2021
16 Feb 2021

When Should We Use Testlet Model? A Comparison Study of Bayesian Testlet Random-Effects Model and Standard 2-PL Bayesian Model
Yue Liu ... Hong-Yun Liu
Acta Psychologica Sinica | VOL. 44
Yue Liu, et. al.Yue Liu ... Hong-Yun Liu
28 Feb 2012
Acta Psychologica Sinica | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Marginal Effects of Language and Individual Raters on Speech Quality Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access