Output-based speech quality assessment using autoencoder and support vector regression

Jing Wang,Yahui Shan,Xiang Xie,Jingming Kuang

doi:10.1016/j.specom.2019.04.002

Abstract

The output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping model from objective representation to subjective scores. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. The proposed method takes advantage of autoencoder in forming a good representation of its input which can be better mapped to Mean Opinion Score. The experimental results show that compared with the standardization ITU-T P.563 and another deep learning-based assessment method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.

Full Text