Critical-band based frequency compression for digital hearing aids

Keiichi Yasu,Masato Hishitani,Yuji Murahara,Takayuki Arai

doi:10.1250/ast.25.61

Keiichi Yasu, Masato Hishitani + Show 2 more

Open Access

https://doi.org/10.1250/ast.25.61

Copy DOI

Abstract

Department of Electrical and Electronics Engineering, Sophia University,7–1 Kioi-cho, Chiyoda-ku, Tokyo, 102–8554 Japan(Received 1 April 2003, Accepted for publication 20 June 2003)Keywords: Hearing aids, Critical band, Speech intelligibility, Real-time processing, DSPPACS number: 43.66.Yw [DOI: 10.1250/ast.25.61]1. IntroductionThere were many studies on the human auditory ﬁlter andthe critical band (e.g., Fletcher [1] and Zwicker [2]). Pattersonmeasured an auditory ﬁlter using notched-noise method [3].Glasberg and Moore measured an auditory ﬁlter of hearingimpaired and normal hearing people with notched-noisemasker and reported that hearing impaired people had widerauditory ﬁlter than normal hearing people [4]. In the previousstudies [5,6], a speech signal was split into 18 critical bands,and a set of odd-numbered bands was presented to thesubject’s right ear, while the rest was presented to the left ear.The speech signals became clearer for both normal hearingand hearing impaired subjects. This approach, however, isonly useful when both ears have similar auditory character-istics. Therefore, we proposed an epochal method in whichcritical-band was compressed along the frequency axis in lightof the shape of the auditory ﬁlters of hearing-impaired people[7] (Fig. 1).In Exp. 1, two hearing-impaired people subjectivelyevaluated the quality and intelligibility of speech soundsusing the Mean Opinion Score (MOS). In Exp. 2, they took anintelligibility test for an objective evaluation.2. AlgorithmsTwo approaches were tested in our previous study [7]. Inboth approaches a speech signal was compressed toward thecenter of each critical band along the frequency axis. The ﬁrstapproach was based on a ﬁlter bank with a set of bandpassﬁlters. The second was based on the fast Fourier transform(FFT). In this paper, we use the FFT-based approach.First, an input speech signal was divided into frames witha frame length of 512 samples, a frame shift of 128 samplesand windowed by the Hamming window. Next, the signal foreach frame was transformed from the time domain to thefrequency domain by FFT. After the amplitude and phasespectra of the FFT were calculated, a compressed amplitudespectrum was computed for each band. The compression wasdone for the amplitude spectrum toward the center of eachcritical band along the frequency axis. The compression rateranged from 10% to 90%. Next, the amplitude spectrum afterpiece-wise compression was multiplied by the original phasespectrum. Finally, the overlap add (OLA) technique wasapplied to the IFFT of the product from the previous step toobtain the ﬁnal signal. The stimuli were normalized by theRMS. The simulation of compression algorithm implementedby using ‘‘SIMULINK.’’ Figure 2 shows the block-diagram ofthis technique.3. Experiments and resultsTwo experiments were conducted. In Exp. 1, the qualityand intelligibility of speech sounds were evaluated. In Exp. 2,an intelligibility score was evaluated. Two hearing-impairedsubjects participated in the both experiments. Both subjectshave hearing levels above 90dB, are classiﬁed as profoundlyhearing-impaired people and usually wear hearing aids.Before the experiment, we measured the shapes of criticalbands of subjects with the notched-noise method [3], whichwe implemented with ‘‘SIMULINK.’’ By measuring the shapeof the auditory ﬁlter of hearing impaired, we conﬁrmed thatthe critical band of hearing-impaired people was wider thanfor normal hearing people.3.1. Experiment 1We processed sounds along with sounds compressed by20%, 40%, 60% and 80% using an FFT-based approach. 0%compression (appearing in Tables 1 and 2) corresponds to theoriginal speech sounds. We used six sentences (three spokenby males, three by females) for the speech samples from ‘‘ThePhoneme-Balanced 1000 Sentence Speech Database’’ byNTT-Advanced Technology. The experiment was controlledby a personal computer and was conducted in a soundproofroom. Subjects made pair-wise subjective comparison be-tween the original sounds and processed sounds, and theycould play each sound as many times as needed. Then theyevaluated the quality and intelligibility of the processedspeech sounds using the Mean Opinion Score (MOS). In theMOS test, subjects were asked to evaluate sounds on ﬁve-point scale (1–5). Higher numbers indicated a greater degree,and Point 3 was set for the original. They evaluated 48 (4compression rates 6 sentences 2 repetitions) times in all.The stimuli were presented in random order. Table 1 showsthe average MOS in Exp. 1.3.2. Experiment 2Next, we gave subjects on an intelligibility test. Weprocessed each speech sample from 10% to 90% compressionin 10% steps. The speech samples were nonsense Vowel-Consonant-Vowel (VCV) syllables embedded in a Japanesecarrier phrase. The speech samples were elicited from a nativeJapanese male. The vowels in each VCV syllable were /a/and the consonant varied between each of the 14 Japaneseconsonants. Each stimulus was presented twice, and subjectswere forced to choose one of 14 VCV’s by clicking a button

Full Text