Abstract

The need for reliable objective voice quality assessment techniques is growing rapidly. Developers of voice systems and services, procurers of voice communication equipment, and standards organizations all have pressing requirements for voice performance measurement. Many objective assessment measures have been proposed and studied. Currently, an objective technique accurate and robust enough to replace human listeners in all situations does not appear practical. However, the performance of several objective methods (e.g., those being considered by CCITT) is quite promising, and may justify their use in many cases where listener tests can be augmented or replaced completely. Standards defining these objective measures and their intended applications can be expected in the near future. Several important areas of research related to objective voice quality assessment can bde identified: 1. ⊎It is doubtful that even very complex distortion measures will be sufficient to accurately predict voice quality in many real world cases. For example, simple added delay has no effect on perceived quality unless two-way conversation is taking place, in which case the impact can be significant. Other examples include echo, temporal warping (i.e., time varying delay), and even added impulsive noise such as “pops” and “clicks.” A much more involved psychoacoustic model of hearing may be needed to address these impairments, requiring perhaps expert systems or neural network technology to implement the model. 2. ⊎Measures of voice quality are increasingly sought for vocoder, or synthesized, speech. Distortion measures such as discussed in this paper are not directly applicable, since generally no effort is made to reproduce the input voice signal. Only higher level information such as phonemes is transmitted. 3. ⊎Application-specific objective measures may be a better solution than attempting to identify a single parameter for use in all situations. For example, voice transmitted via single sideband radio will be subject to significantly different types of degradation than those found in telephone speech. Perceived quality is also highly dependent on application; an air traffic controller is more concerned with intelligibility than with tonal fidelity, while the opposite may be true for someone using the telephone to call home. The ITS objective method is one approach to this, by allowing a new parameter set and a new training database to be invoked for new applications. These areas and others will see increasing interest as technical and economic pressure continues to build for dependable objective voice quality assessment techniques. Recommendations expected soon from CCITT will likely represent only a partial solution to this difficult problem, and continued research is crucial to develop more robust and reliable techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call