Subjective Speech Quality Research Articles

This paper proposes deep Gaussian process (DGP)-based frameworks for multi-speaker speech synthesis and speaker representation learning. A DGP has a deep architecture of Bayesian kernel regression, and it has been reported that DGP-based single speaker speech synthesis outperforms deep neural network (DNN)-based ones in the framework of statistical parametric speech synthesis. By extending this method to multiple speakers, it is expected that higher speech quality can be achieved with a smaller number of training utterances from each speaker. To apply DGPs to multi-speaker speech synthesis, we propose two methods: one using DGP with one-hot speaker codes, and the other using a deep Gaussian process latent variable model (DGPLVM). The DGP with one-hot speaker codes uses additional GP layers to transform speaker codes into latent speaker representations. The DGPLVM directly models the distribution of latent speaker representations and learns it jointly with acoustic model parameters. In this method, acoustic speaker similarity is expressed in terms of the similarity of the speaker representations, and thus, the voices of similar speakers are efficiently modeled. We experimentally evaluated the performance of the proposed methods in comparison with those of conventional DNN and variational autoencoder (VAE)-based frameworks, in terms of acoustic feature distortion and subjective speech quality. The experimental results demonstrate that (1) the proposed DGP-based and DGPLVM-based methods improve subjective speech quality compared with a feed-forward DNN-based method, and (2) even when the amount of training data for target speakers is limited, the DGPLVM-based method outperforms other methods, including the VAE-based one. Additionally, (3) by using a speaker representation randomly sampled from the learned speaker space, the DGPLVM-based method can generate voices of non-existent speakers.

Read full abstract

Frequency lowering (FL) technology offers a means of improving audibility of high-frequency sounds. For some listeners, the benefit of such technology can be accompanied by a perceived degradation in sound quality, depending on the strength of the FL setting. The studies presented in this article investigate the effect of a new type of FL signal processing for hearing aids, adaptive nonlinear frequency compression (ANFC), on subjective speech quality. Listener ratings of sound quality were collected for speech stimuli processed with systematically varied fitting parameters. Study 1 included 40 normal-hearing (NH) adult and child listeners. Study 2 included 11 hearing-impaired (HI) adult and child listeners. HI listeners were fitted with laboratory-worn hearing aids for use during listening tasks. Speech quality ratings were assessed across test conditions consisting of various strengths of static nonlinear frequency compression (NFC) and ANFC speech. Test conditions included those that were fine-tuned on an individual basis per hearing aid fitting and conditions that were modified to intentionally alter the sound quality of the signal. Listeners rated speech quality using the MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA) test paradigm. Ratings were analyzed for reliability and to compare results across conditions. Results show that interrater reliability is high for both studies, indicating that NH and HI listeners from both adult and child age groups can reliably complete the MUSHRA task. Results comparing sound quality ratings across experimental conditions suggest that both the NH and HI listener groups rate the stimuli intended to have poor sound quality (e.g., anchors and the strongest available parameter settings) as having below-average sound quality ratings. A different trend in the results is reported when considering the other experimental conditions across the listener groups in the studies. Speech quality ratings measured with NH listeners improve as the strength of ANFC decreases, with a range of bad to good ratings reported, on average. Speech quality ratings measured with HI listeners are similar and above-average for many of the experimental stimuli, including those with fine-tuned NFC and ANFC parameters. Overall, HI listeners provide similar sound quality ratings when comparing static and adaptive forms of frequency compression, especially when considering the individualized parameter settings. These findings suggest that a range in settings may result in above-average sound quality for adults and children with hearing impairment. Furthermore, the fitter should fine-tune FL parameters for each individual listener, regardless of type of FL technology.

Read full abstract

Subjective Speech Quality Research Articles

Related Topics

Articles published on Subjective Speech Quality

Evaluation of digital watermarking on subjective speech quality

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation

Speech signal enhancement in cocktail party scenarios by deep learning based virtual sensing of head-mounted microphones

Speech quality estimation with deep lattice networks.

Use of Emotional and Neutral Speech in Evaluating Compression Speeds.

Noise and acoustic conditions of premises for hearing-impaired people in Korea

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Sound Quality Effects of an Adaptive Nonlinear Frequency Compression Processor with Normal-Hearing and Hearing-Impaired Listeners.

Objective and Subjective Speech Quality Assessment of Amplification Devices for Patients With Parkinson's Disease.

Subjective Self-Rated Speech Intelligibility and Quality of Life in Patients with Parkinson’s Disease in a Malaysian Sample

Subjective speech quality measurement with and without parallel task: Laboratory test results comparison

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

Spectral difference for statistical model-based speech enhancement in speech recognition

Subjective speech quality measurement repeatability: comparison of laboratory test results

Novel adaptive muting technique for packet loss concealment of ITU-T G.722 using optimized parametric shaping functions

On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement

Evaluation of VoIP Speech Quality Using Neural Network

The relationship between fluency, intelligibility, and acceptability of non-native spoken English

Comparison of two channel selection criteria for noise suppression in cochlear implants

Compressed domain speech enhancement method based on ITU-T G.722.2

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Subjective Speech Quality Research Articles

Related Topics

Articles published on Subjective Speech Quality

Evaluation of digital watermarking on subjective speech quality

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation

Speech signal enhancement in cocktail party scenarios by deep learning based virtual sensing of head-mounted microphones

Speech quality estimation with deep lattice networks.

Use of Emotional and Neutral Speech in Evaluating Compression Speeds.

Noise and acoustic conditions of premises for hearing-impaired people in Korea

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Sound Quality Effects of an Adaptive Nonlinear Frequency Compression Processor with Normal-Hearing and Hearing-Impaired Listeners.

Objective and Subjective Speech Quality Assessment of Amplification Devices for Patients With Parkinson's Disease.

Subjective Self-Rated Speech Intelligibility and Quality of Life in Patients with Parkinson’s Disease in a Malaysian Sample

Subjective speech quality measurement with and without parallel task: Laboratory test results comparison

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

Spectral difference for statistical model-based speech enhancement in speech recognition

Subjective speech quality measurement repeatability: comparison of laboratory test results

Novel adaptive muting technique for packet loss concealment of ITU-T G.722 using optimized parametric shaping functions

On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement

Evaluation of VoIP Speech Quality Using Neural Network

The relationship between fluency, intelligibility, and acceptability of non-native spoken English

Comparison of two channel selection criteria for noise suppression in cochlear implants

Compressed domain speech enhancement method based on ITU-T G.722.2