Subjective Listening Tests Research Articles

Traditionally, speech quality evaluation relies on subjective assessments or intrusive methods that require reference signals or additional equipment. However, over recent years, non-intrusive speech quality assessment has emerged as a promising alternative, capturing much attention from researchers and industry professionals. This article presents a deep learning-based method that exploits large-scale intrusive simulated data to improve the accuracy and generalization of non-intrusive methods. The major contributions of this article are as follows. First, it presents a data simulation method, which generates degraded speech signals and labels their speech quality with the perceptual objective listening quality assessment (POLQA). The generated data is proven to be useful for pretraining the deep learning models. Second, it proposes to apply an adversarial speaker classifier to reduce the impact of speaker-dependent information on speech quality evaluation. Third, an autoencoder-based deep learning scheme is proposed following the principle of representation learning and adversarial training (AT) methods, which is able to transfer the knowledge learned from a large amount of simulated speech data labeled by POLQA. With the help of discriminative representations extracted from the autoencoder, the prediction model can be trained well on a relatively small amount of speech data labeled through subjective listening tests. Fourth, an end-to-end speech quality evaluation neural network is developed, which takes magnitude and phase spectral features as its inputs. This phase-aware model is more accurate than the model using only the magnitude spectral features. A large number of experiments are carried out with three datasets: one simulated with labels obtained using POLQA and two recorded with labels obtained using subjective listening tests. The results show that the presented phase-aware method improves the performance of the baseline model and the proposed model with latent representations extracted from the adversarial autoencoder (AAE) outperforms the state-of-the-art objective quality assessment methods, reducing the root mean square error (RMSE) by 10.5% and 12.2% on the Beijing Institute of Technology (BIT) dataset and Tencent Corpus, respectively. The code and supplementary materials are available at https://github.com/liushenme/AAE-SQA.

Read full abstract

New methods of securing the distribution of audio content have been widely deployed in the last twenty years. Their impact on perceptive quality has, however, only been seldomly the subject of recent extensive research. We review digital speech watermarking state of the art and provide subjective testing of watermarked speech samples. Latest speech watermarking techniques are listed, with their specifics and potential for further development. Their current and possible applications are evaluated. Open-source software designed to embed watermarking patterns in audio files is used to produce a set of samples that satisfies the requirements of modern speech-quality subjective assessments. The patchwork algorithm that is coded in the application is mainly considered in this analysis. Different watermark robustness levels are used, which allow determining the threshold of detection to human listeners. The subjective listening tests are conducted following ITU-T P.800 Recommendation, which precisely defines the conditions and requirements for subjective testing. Further analysis tries to determine the effects of noise and various disturbances on watermarked speech’s perceived quality. A threshold of intelligibility is estimated to allow further openings on speech compression techniques with watermarking. The impact of language or social background is evaluated through an additional experiment involving two groups of listeners. Results show significant robustness of the watermarking implementation, retaining both a reasonable net subjective audio quality and security attributes, despite mild levels of distortion and noise. Extended experiments with Chinese listeners open the door to formulate a hypothesis on perception variations with geographical and social backgrounds.

Read full abstract

Subjective Listening Tests Research Articles

Related Topics

Articles published on Subjective Listening Tests

Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.

Differences in speech intelligibility in noise under Ambisonics-based virtual acoustic environments with varying sound recording/rendering methods

Examining the voice verification system resistance developed for banking to attacks employing the voice cloning

Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion

Method to control the amount of “musical” noise for speech quality assessments

TTS-Guided Training for Accent Conversion Without Parallel Data

EmotionBox: A music-element-driven emotional music generation system based on music psychology.

A Comparative Study of Speech Coding Techniques for Electro Larynx Speech Production

Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance

A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

Using machine learning to evaluate the fidelity of heavy equipment acoustic simulations

Evaluation of digital watermarking on subjective speech quality

Subjective evaluation of the combining effect between the virtual bass and head related transfer functions

Subjective studies on impact sound in times of a pandemic -- a comparison between a laboratory study and an online listening test

Go Listen: An End-to-End Online Listening Test Platform

Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Using machine learning to evaluate the fidelity of acoustic simulations

Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion

AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Subjective Listening Tests Research Articles

Related Topics

Articles published on Subjective Listening Tests

Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.

Differences in speech intelligibility in noise under Ambisonics-based virtual acoustic environments with varying sound recording/rendering methods

Examining the voice verification system resistance developed for banking to attacks employing the voice cloning

Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion

Method to control the amount of “musical” noise for speech quality assessments

TTS-Guided Training for Accent Conversion Without Parallel Data

EmotionBox: A music-element-driven emotional music generation system based on music psychology.

A Comparative Study of Speech Coding Techniques for Electro Larynx Speech Production

Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance

A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

Using machine learning to evaluate the fidelity of heavy equipment acoustic simulations

Evaluation of digital watermarking on subjective speech quality

Subjective evaluation of the combining effect between the virtual bass and head related transfer functions

Subjective studies on impact sound in times of a pandemic -- a comparison between a laboratory study and an online listening test

Go Listen: An End-to-End Online Listening Test Platform

Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Using machine learning to evaluate the fidelity of acoustic simulations

Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion

AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio