Abstract
This paper proposes a novel two-scale auditory feature based algorithm for non-intrusive evaluation of speech quality. The neuron firing probabilities along the length of the basilar membrane, from an explicit auditory model, are used to extract features from the distorted speech signal. This is in contrast to previous methods, which either use standard vocal tract based features, or incorporate only some aspects of the human auditory perception mechanism. The features are extracted at two scales, namely a global scale spanning all voiced frames in an utterance, and a local scale spanning voiced frames from contiguous voiced segments in the utterance. This is followed by a simple information fusion at the score level using Gaussian Mixture Models (GMMs). The use of an explicit auditory model to extract features is based on the premise that similar processing (in a qualitative sense) happens in human speech perception. In addition, auditory feature extraction at two scales incorporates the effects of both long term and short term distortions on speech quality. The proposed algorithm is shown to perform at least as good as the ITU-T Recommendation P.563.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.