Abstract

Natural speech is modeled by linear prediction using Prony's formulation. An all-pole model is assumed for voiced segments while a poles and zeros model is considered more suitable for unvoiced segments. The relative magnitude in the short-time spectrum is also determined at the pole and zero frequencies. The effects of poles and zeros and their interaction (causing masking) on intelligibility were investigated by adaptively filtering them based on their frequency, bandwidth, and magnitude information. A recursive digital filter was used which suppresses real roots (corresponding to glottal source and vocal radiation) and/or complex roots (corresponding to vocal tract resonances). This approach was taken so that one or more roots can be removed at a time from the natural speech, without affecting anything else. A semiautomatic on-line intelligibility measurement system utilizing pattern recognition concepts has been developed for PB word listening tests. The preliminary results indicate that suppressing the real poles and the poles corresponding to third and fourth resonances does not degrade intelligibility. The effect of zeros and poles for the first and second resonances is discussed based on formal intelligibility test results. [The authors acknowledge Satish Chandra of CWRU for zeros estimation algorithm.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.