Although the telephone bandwidth is 0-4 kHz, the speaker specific information is not evenly distributed within this range but extends beyond 4 kHz. This is because the hypopharynx, especially the piriform fossa, affects the higher frequency region and contributes more to inter-speaker variation. By effectively shifting the higher bands to a lower frequency region, where speaker specific information is reduced, an effective 4 kHz bandwidth can be constructed to enhance speaker recognition performance. To achieve this a method was already proposed, which is extended in this paper to experimentally demonstrate and validate with more experiments. Furthermore, this paper defines the theoretically possible frequency space for which the frequency shifting method can be applied. To validate the method for different combinations of bands, possible bands were shifted in various directions in small steps. Speaker recognition experiments were conducted at each step to compare the performance against the baseband without any frequency shifting. Using the results of these extensive experiments, an approximate frequency space was defined where this frequency shifting performed better than the conventional baseband of 0-4 kHz signal. A simplified frequency shifting method was also investigated. Finally, the speech intelligibility of the frequency shifted narrow band speech signal was analysed using objective speech quality measures. This showed that intelligibility was not significantly affected by the frequency shifting method.
Read full abstract