Abstract

In this study, the generalized parametric spectral subtraction estimator is employed in the context of a ROVER speech enhancement framework to develop a robust phoneme class selective enhancement algorithm. The parametric estimator is derived by a) optimizing the weighted Euclidean distortion cost function and b) by modeling clean speech spectral magnitudes as Rayleigh distributed priors. A set of enhanced utterances are generated from a single noisy utterance by tuning the parameters of the parametric estimator for different phoneme classes. The speech and non-speech segments are segregated using a voice activity detector. Thereafter, the mixture maximum model is used to make soft decisions on these segments to determine their phoneme class weights. The segments from the enhanced utterances are weighted by these decisions and combined to form the final composite utterance. Using segmental SNR and Itakura-Saito metrics over two noise types and four SNR levels, it was demonstrated that the composite utterance exhibited better phoneme class improvement than the individual utterances enhanced from the parametric estimator.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.