Abstract

In any speech communication system, the presence of background noises cause the quality or intelligibility of speech to degrade. Speech quality refers to naturalness and cleanliness that is, how good the signal is perceived. Speech intelligibility refers to understandability between speaker's and listener's message. Speech corrupted by noise (far-end speech) and speech rendered in a noisy environment (near-end speech) lacks intelligibility and is uncomfortable for human listening. The current work aims to improve speech intelligibility for a near-end listener. State of the art near-end speech enhancement algorithms improve intelligibility by reallocating speech energy over time and frequency depends on the perceptual distortion measure. This method automatically redistributes speech energy from vowels to transients with reduced delay. However it treats different classes (e.g. fricatives, stops, liquids, nasals) of consonant sounds as a single group during energy reallocation. The effect of noise is disparate and it varies among different consonant classes of sound. Therefore an analysis is carried out to find out the energy relation among the classes of sound units and weightings are given to low energy classes of sounds. Then weighted energy reallocation method is evaluated for speech quality and intelligibility using PESQ and STOI. In addition to that an analysis is carried out to find out the optimum segment size for energy reallocation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call