Abstract

The perceptual quality of VoIP conversations depends tightly on the pattern of packet losses, i.e., the distribution and duration of packet loss runs. The wider (resp. smaller) the inter-loss gap (resp. loss gap) duration, the lower is the quality degradation. Moreover, a set of speech sequences impaired using an identical packet loss pattern results in a different degree of perceptual quality degradation because dropped voice packets have unequal impact on the perceived quality. Therefore, we consider the voicing feature of speech wave included in lost packets in addition to packet loss pattern to estimate speech quality scores. We distinguish between voiced, unvoiced, and silence packets. This enables to achieve better correlation and accuracy between human-based subjective and machine-calculated objective scores.This paper proposes novel no-reference parametric speech quality estimate models which account for the voicing feature of signal wave included in missing packets. Precisely, we develop separate speech quality estimate models, which capture the perceptual effect of removed voiced or unvoiced packets, using elaborated simple and multiple regression analyses. A new speech quality estimate model, which mixes voiced and unvoiced quality scores to compute the overall speech quality score at the end of an assessment interval, is developed following a rigorous multiple linear regression analysis. The input parameters of proposed voicing-aware speech quality estimate models, namely Packet Loss Ratio (PLR) and Effective Burstiness Probability (EBP), are extracted based on a novel Markov model of voicing-aware packet loss which captures properly the feature of packet loss process as well as the voicing property of speech wave included in lost packets. The conceived voicing-aware packet loss model is calibrated at run time using an efficient packet loss event driven algorithm. The performance evaluation study shows that our voicing-aware speech quality estimate models outperform voicing-unaware speech quality estimate models, especially in terms of accuracy over a wide range of conditions. Moreover, it validates the accuracy of the developed parametric no-reference speech quality models. In fact, we found that predicted scores using our speech quality models achieve an excellent correlation with measured scores (>0.95) and a small mean absolute deviation (<0.25) for ITU-T G.729 and G.711 speech CODECs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.