The problem of speech enhancement in diverse noisy conditions has historically focused on the vocal tract spectral magnitude. However, studies have shown that improved quality, speaking style and speaker identity are all impacted by reliable prosody/F0 (pitch) information for human listening. In this study, we propose a speech enhancement algorithm based on pitch pattern matching. It can be considered as an example-based method since we attempt to replace the speech segment in the noisy speech with corresponding detected components from a dictionary which contains the clean speech signals. The average pitch value as well as overall pitch dynamic trends are used as features for pitch pattern matching. The speech segment in the dictionary with the best matched pitch pattern feature will be used to assist in the computation of an enhanced speech segment in the noisy speech sample. Here, Wiener filter is used for obtaining the target baseline speech from the noisy speech. The experimental results show that the pitch pattern feature is more computational efficient than a spectral based feature alone for speech enhancement, while obtaining similar speech enhancement performance in terms of both speech intelligibility and speech quality.
Read full abstract