Human Speech Perception Research Articles

Within the framework of a dynamically developing direction of research in the field of acoustic measurements, the task of spectral analysis of speech signals in automatic speech recognition systems is considered. The low efficiency of the systems in unfavorable speech production conditions (noise, insufficient intelligibility of speech sounds) compared to human perception of oral speech is noted. To improve the efficiency of automatic speech recognition systems, a two-stage algorithm for spectral analysis of speech signals is proposed. The first stage of speech signal processing consists of its parametric spectral analysis using an autoregressive model of the vocal tract of a conditional speaker. The second stage of processing is the transformation (modification) of the obtained spectral estimate according to the principle of frequency-selective amplification of the amplitude of the main formants of the intra-periodic power spectrum. The software implementation of the proposed algorithm based on the high-speed computational procedure of the fast Fourier transform is described. Using the author’s software, a full-scale experiment was carried out: an additive mixture of vowel sounds of the control speaker’s speech with white Gaussian noise was studied. Based on the results of the experiment, it was concluded that the amplitude of the main speech signal formants were amplified by 10–20 dB and, accordingly, a significant improvement in the speech sounds intelligibility. The scope of possible application of the developed algorithm covers automatic speech recognition systems based on speech signal processing in the frequency domain, including the use of artificial neural networks.

Read full abstract

Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading, i.e., watching the movements of a speaker's mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study ( N=21) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution (1832×1920 or 916×960 pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants' speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants' ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.

Read full abstract

Human Speech Perception Research Articles

Related Topics

Articles published on Human Speech Perception

Release from same-talker speech-in-speech masking: Effects of masker intelligibility and other contributing factorsa).

Two-stage algorithm of spectral analysis for automatic speech recognition systems

Time scale of adaptation at the tonal sequence processing in the awake mice auditory cortex neurons

Developmental hearing loss–induced perceptual deficits are rescued by genetic restoration of cortical inhibition

A module for automatic analysis of burst spectra for consonant place detection

Visual Facial Enhancements Can Significantly Improve Speech Perception in the Presence of Noise.

Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification

Design of P-FLANN Model for Intelligent Water Fountain Sound Pleasantness Monitoring Using Bio-inspired Computing and Human Speech Perception

Perceptual specializations for processing species-specific vocalizations in the common marmoset (Callithrix jacchus)

Hearing Dogs and Seeing Barks: Multimodal Sensory Perception of Dogs

Development of an algorithm for characterizing speech production patterns as context-based Cue Production Profiles

Can You Hear Me Now? Sensitive Comparisons of Human and Machine Perception.

Auditory pattern discrimination in budgerigars (Melopsittacus undulatus)

Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords.

Degraded cortical temporal processing in the valproic acid-induced rat model of autism

Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR

Frequency-Following Responses to Speech Sounds Are Highly Conserved across Species and Contain Cortical Contributions.

Speech Computations of the Human Superior Temporal Gyrus.

How, when, and where predictions combine with speech in auditory cortexd

Automated detection of Glottal-related acoustic cues for feature-cue-based analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Human Speech Perception Research Articles

Related Topics

Articles published on Human Speech Perception

Release from same-talker speech-in-speech masking: Effects of masker intelligibility and other contributing factorsa).

Two-stage algorithm of spectral analysis for automatic speech recognition systems

Time scale of adaptation at the tonal sequence processing in the awake mice auditory cortex neurons

Developmental hearing loss–induced perceptual deficits are rescued by genetic restoration of cortical inhibition

A module for automatic analysis of burst spectra for consonant place detection

Visual Facial Enhancements Can Significantly Improve Speech Perception in the Presence of Noise.

Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification

Design of P-FLANN Model for Intelligent Water Fountain Sound Pleasantness Monitoring Using Bio-inspired Computing and Human Speech Perception

Perceptual specializations for processing species-specific vocalizations in the common marmoset (Callithrix jacchus)

Hearing Dogs and Seeing Barks: Multimodal Sensory Perception of Dogs

Development of an algorithm for characterizing speech production patterns as context-based Cue Production Profiles

Can You Hear Me Now? Sensitive Comparisons of Human and Machine Perception.

Auditory pattern discrimination in budgerigars (Melopsittacus undulatus)

Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords.

Degraded cortical temporal processing in the valproic acid-induced rat model of autism

Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR

Frequency-Following Responses to Speech Sounds Are Highly Conserved across Species and Contain Cortical Contributions.

Speech Computations of the Human Superior Temporal Gyrus.

How, when, and where predictions combine with speech in auditory cortexd

Automated detection of Glottal-related acoustic cues for feature-cue-based analysis