Abstract

Department of Electrical and Electronics Engineering, Sophia University,7–1 Kioi-cho, Chiyoda-ku, Tokyo, 102–8554 Japan(Received 1 April 2003, Accepted for publication 20 June 2003)Keywords: Hearing aids, Critical band, Speech intelligibility, Real-time processing, DSPPACS number: 43.66.Yw [DOI: 10.1250/ast.25.61]1. IntroductionThere were many studies on the human auditory filter andthe critical band (e.g., Fletcher [1] and Zwicker [2]). Pattersonmeasured an auditory filter using notched-noise method [3].Glasberg and Moore measured an auditory filter of hearingimpaired and normal hearing people with notched-noisemasker and reported that hearing impaired people had widerauditory filter than normal hearing people [4]. In the previousstudies [5,6], a speech signal was split into 18 critical bands,and a set of odd-numbered bands was presented to thesubject’s right ear, while the rest was presented to the left ear.The speech signals became clearer for both normal hearingand hearing impaired subjects. This approach, however, isonly useful when both ears have similar auditory character-istics. Therefore, we proposed an epochal method in whichcritical-band was compressed along the frequency axis in lightof the shape of the auditory filters of hearing-impaired people[7] (Fig. 1).In Exp. 1, two hearing-impaired people subjectivelyevaluated the quality and intelligibility of speech soundsusing the Mean Opinion Score (MOS). In Exp. 2, they took anintelligibility test for an objective evaluation.2. AlgorithmsTwo approaches were tested in our previous study [7]. Inboth approaches a speech signal was compressed toward thecenter of each critical band along the frequency axis. The firstapproach was based on a filter bank with a set of bandpassfilters. The second was based on the fast Fourier transform(FFT). In this paper, we use the FFT-based approach.First, an input speech signal was divided into frames witha frame length of 512 samples, a frame shift of 128 samplesand windowed by the Hamming window. Next, the signal foreach frame was transformed from the time domain to thefrequency domain by FFT. After the amplitude and phasespectra of the FFT were calculated, a compressed amplitudespectrum was computed for each band. The compression wasdone for the amplitude spectrum toward the center of eachcritical band along the frequency axis. The compression rateranged from 10% to 90%. Next, the amplitude spectrum afterpiece-wise compression was multiplied by the original phasespectrum. Finally, the overlap add (OLA) technique wasapplied to the IFFT of the product from the previous step toobtain the final signal. The stimuli were normalized by theRMS. The simulation of compression algorithm implementedby using ‘‘SIMULINK.’’ Figure 2 shows the block-diagram ofthis technique.3. Experiments and resultsTwo experiments were conducted. In Exp. 1, the qualityand intelligibility of speech sounds were evaluated. In Exp. 2,an intelligibility score was evaluated. Two hearing-impairedsubjects participated in the both experiments. Both subjectshave hearing levels above 90dB, are classified as profoundlyhearing-impaired people and usually wear hearing aids.Before the experiment, we measured the shapes of criticalbands of subjects with the notched-noise method [3], whichwe implemented with ‘‘SIMULINK.’’ By measuring the shapeof the auditory filter of hearing impaired, we confirmed thatthe critical band of hearing-impaired people was wider thanfor normal hearing people.3.1. Experiment 1We processed sounds along with sounds compressed by20%, 40%, 60% and 80% using an FFT-based approach. 0%compression (appearing in Tables 1 and 2) corresponds to theoriginal speech sounds. We used six sentences (three spokenby males, three by females) for the speech samples from ‘‘ThePhoneme-Balanced 1000 Sentence Speech Database’’ byNTT-Advanced Technology. The experiment was controlledby a personal computer and was conducted in a soundproofroom. Subjects made pair-wise subjective comparison be-tween the original sounds and processed sounds, and theycould play each sound as many times as needed. Then theyevaluated the quality and intelligibility of the processedspeech sounds using the Mean Opinion Score (MOS). In theMOS test, subjects were asked to evaluate sounds on five-point scale (1–5). Higher numbers indicated a greater degree,and Point 3 was set for the original. They evaluated 48 (4compression rates 6 sentences 2 repetitions) times in all.The stimuli were presented in random order. Table 1 showsthe average MOS in Exp. 1.3.2. Experiment 2Next, we gave subjects on an intelligibility test. Weprocessed each speech sample from 10% to 90% compressionin 10% steps. The speech samples were nonsense Vowel-Consonant-Vowel (VCV) syllables embedded in a Japanesecarrier phrase. The speech samples were elicited from a nativeJapanese male. The vowels in each VCV syllable were /a/and the consonant varied between each of the 14 Japaneseconsonants. Each stimulus was presented twice, and subjectswere forced to choose one of 14 VCV’s by clicking a button

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call