Abstract
A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons’ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.