Speech quality assessment with WARP‐Q: From similarity to subsequence dynamic time warp cost

Wissam A Jassim,Jan Skoglund,Michael Chinen,Andrew Hines

doi:10.1049/sil2.12151

Wissam A Jassim, Jan Skoglund + Show 2 more

https://doi.org/10.1049/sil2.12151

Copy DOI

Abstract

Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bit rate streams, recently developed generative speech models can reconstruct high-quality wideband speech from the bit streams of standard parametric encoders at less than 3 kb/s. Generative codecs produce high-quality speech based on synthesising speech from a DNN and the parametric input. Existing objective speech quality models (e.g., ViSQOL and POLQA) cannot be used to accurately evaluate the quality of coded speech from generative models as they penalise based on signal differences not apparent in subjective listening test results. This paper presents WARP-Q, a full-reference objective speech quality metric that uses a dynamic time warping cost for MFCC representations of the signals. It is robust to low perceptual signal changes introduced by low bit rate neural vocoders. An evaluation using waveform matching, parametric, and generative neural vocoder-based codecs as well as channel and environmental noise shows that WARP-Q has better correlation and codec quality ranking for novel codecs compared to traditional metrics as well as the versatility of capturing other types of degradations, such as additive noise and transmission channel degradations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IET Signal Processing	Publication Date: Aug 16, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Speech quality assessment with WARP‐Q: From similarity to subsequence dynamic time warp cost

Abstract

Talk to us

Similar Papers

More From: IET Signal Processing

Lead the way for us

Similar Papers

Warp-Q: Quality Prediction for Generative Neural Speech Codecs
Wissam A Jassim ... Michael Chinen
-
Wissam A Jassim, et. al.Wissam A Jassim ... Michael Chinen
06 Jun 2021
06 Jun 2021

Assessing Segmental Impact for Objective Speech Quality Evaluation
Zhixing Liu ... Gaoxiong Yi
-
Zhixing Liu, et. al.Zhixing Liu ... Gaoxiong Yi
17 Oct 2021
17 Oct 2021

CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric
Rahul Kumar Jaiswal ... Rajesh Kumar Dubey
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19
Rahul Kumar Jaiswal, et. al.Rahul Kumar Jaiswal ... Rajesh Kumar Dubey
03 Feb 2023
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19

Non-intrusive speech quality assessment using context-aware neural networks
Rahul Kumar Jaiswal ... Rajesh Kumar Dubey
International Journal of Speech Technology | VOL. 25
Rahul Kumar Jaiswal, et. al.Rahul Kumar Jaiswal ... Rajesh Kumar Dubey
23 Oct 2022
International Journal of Speech Technology | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech quality assessment with WARP‐Q: From similarity to subsequence dynamic time warp cost

Abstract

Talk to us

Similar Papers

More From: IET Signal Processing