Two signal-processing algorithms, designed to separate the voiced speech of two talkers speaking simultaneously at similar intensities in a single channel, were compared and evaluated. Both algorithms exploit the harmonic structure of voiced speech and require a difference in fundamental frequency (F0) between the voices to operate successfully. One attenuates the interfering voice by filtering the cepstrum of the combined signal. The other uses the method of harmonic selection [T. W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)] to resynthesize the target voice from fragmentary spectral information. Two perceptual evaluations were carried out. One involved the separation of pairs of vowels synthesized on static F0's; the other involved the recovery of consonant-vowel (CV) words masked by a synthesized vowel. Normal-hearing listeners and four listeners with moderate-to-severe, bilateral, symmetrical, sensorineural hearing impairments were tested. All listeners showed increased accuracy of identification when the target voice was enhanced by processing. The vowel-identification data show that intelligibility enhancement is possible over a range of F0 separations between the target and interfering voice. The recovery of CV words demonstrates that the processing is valid not only for spectrally static vowels but also for less intense time-varying voiced consonants. The results for the impaired listeners suggest that the algorithms may be applicable as components of a noise-reduction system in future digital signal-processing hearing aids. The vowel-separation test, and subjective listening, suggest that harmonic selection, which is the more computationally expensive method, produces the more effective voice separation.
Read full abstract