Abstract

Discrepancy between subjective and objective evaluation is one of the annoying issues in speech processing including speech enhancement. Subjective evaluation is ideal, although it is time-consuming with a lot of participants. Then, objective distortion measures have been designed as the substitutes for subjective listening tests. However, each distortion measure is optimized under very restricted condition for the specific application. Therefore, discrepancy between subjective and objective evaluation of noise-reduced speech is often caused in the real world. In this paper, the factor of the discrepancy is investigated in detail by comparing the subjective evaluation with the short-term objective evaluation. Almost all state-of-the-art distortion measures introduce the importance weight in the frequency region. On the other hand, this paper considers the temporal variation of speech distortion to understand the relationship between subjective and objective evaluation of noise-reduced speech. Distribution of short-term speech distortion was prepared using the temporal frames with various lengths. It is found that the skewness of the short-term speech distortion distribution could be a clue for explaining the discrepancy between subjective and objective evaluation. [Work supported by NEDO, Japan]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call