Abstract
Natural speech signals are continuous and have great redundancy in time, which is learned by human beings and guarantees speech comprehension even in challenging circumstances. This work develops a sound synthesis method to study some time-related characteristics of the human auditory system. The temporal information within the individual frequency band of a speech is down-sampled using Gaussian-shaped pulses and then recombined into a new sound which may have some remaining intelligibility or feasible phonetic features. The “atomic” sound is coined to the sound with an extremely spectral-temporally sparse spectrogram generated using the method. A battery of speech perception tests was administered in normal-hearing listeners. Results show that (1) atomic sounds from clear speech can be understood as speech, although the listeners often reported a feeling of water sound textures; (2) the temporal and spectral resolution could be traded off in the atomic speech comprehension; and (3) only one maximum envelope value preserved among a 32-channel filter-bank at a rate of 400 Hz was surprisingly adequate for speech understanding, which indicates that the brain can organize the only spectral peak within each short duration no longer than 2.5-ms into sentence understanding without either explicit or implicit encoding of the first three formants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.