Audiovisual Cues Research Articles

This paper proposes a novel lip-reading driven deep learning framework for speech enhancement. The approach leverages the complementary strengths of both deep learning and analytical acoustic modeling (filtering-based approach) as compared to benchmark approaches that rely only on deep learning. The proposed audio-visual (AV) speech enhancement framework operates at two levels. In the first level, a novel deep learning based lip-reading regression model is employed. In the second level, lip-reading approximated clean-audio features are exploited, using an enhanced, visually-derived Wiener filter (EVWF), for estimating the clean audio power spectrum. Specifically, a stacked long-short-term memory (LSTM) based lip-reading regression model is designed for estimating the clean audio features using only temporal visual features (i.e., lip reading), by considering a range of prior visual frames. For clean speech spectrum estimation, a new filterbank-domain EVWF is formulated, which exploits the estimated speech features. The EVWF is compared with conventional spectral subtraction and log-minimum mean-square error methods using both ideal AV mapping and LSTM driven AV mapping approaches. The potential of the proposed AV speech enhancement framework is evaluated under four different dynamic real-world scenarios [cafe, street junction, public transport, and pedestrian area] at different SNR levels (ranging from low to high SNRs) using benchmark grid and ChiME3 corpora. For objective testing, perceptual evaluation of speech quality is used to evaluate the quality of restored speech. For subjective testing, the standard mean-opinion-score method is used with inferential statistics. Comparative simulation results demonstrate significant lip-reading and speech enhancement improvements in terms of both speech quality and speech intelligibility. Ongoing work is aimed at enhancing the accuracy and generalization capability of the deep learning driven lip-reading model, using contextual integration of AV cues, leading to context-aware, autonomous AV speech enhancement.

Purpose Speech perception in noise becomes difficult with age but can be facilitated by audiovisual (AV) speech cues and sentence context in healthy older adults. However, individuals with Alzheimer's disease (AD) may present with deficits in AV integration, potentially limiting the extent to which they can benefit from AV cues. This study investigated the benefit of these cues in individuals with mild cognitive impairment (MCI), individuals with AD, and healthy older adult controls. Method This study compared auditory-only and AV speech perception of sentences presented in noise. These sentences had one of two levels of context: high (e.g., "Stir your coffee with a spoon") and low (e.g., "Bob didn't think about the spoon"). Fourteen older controls (M age = 72.71 years, SD = 9.39), 13 individuals with MCI (M age = 79.92 years, SD = 5.52), and nine individuals with probable Alzheimer's-type dementia (M age = 79.38 years, SD = 3.40) completed the speech perception task and were asked to repeat the terminal word of each sentence. Results All three groups benefited (i.e., identified more terminal words) from AV and sentence context. Individuals with MCI showed a smaller AV benefit compared to controls in low-context conditions, suggesting difficulties with AV integration. Individuals with AD showed a smaller benefit in high-context conditions compared to controls, indicating difficulties with AV integration and context use in AD. Conclusions Individuals with MCI and individuals with AD do benefit from AV speech and semantic context during speech perception in noise (albeit to a lower extent than healthy older adults). This suggests that engaging in face-to-face communication and providing ample context will likely foster more effective communication between patients and caregivers, professionals, and loved ones.

Audiovisual Cues Research Articles

Related Topics

Articles published on Audiovisual Cues

Deep learning for depression recognition with audiovisual cues: A review

Exploring developmental changes in infant anticipation and perceptual processing: EEG responses to tactile stimulation.

Using Optogenetics to Reverse Neuroplasticity and Inhibit Cocaine Seeking in Rats

Using Optogenetics to Reverse Neuroplasticity and Inhibit Cocaine Seeking in Rats

Nicotine self-administration with menthol and audiovisual cue facilitates differential packaging of CYP2A6 and cytokines/chemokines in rat plasma extracellular vesicles

Multisensory-Guided Associative Learning Enhances Multisensory Representation in Primary Auditory Cortex.

Posttraumatic Stress Disorder Symptoms and Coping Motives are Independently Associated with Cannabis Craving Elicited by Trauma Cues.

A Vision-Based Social Distancing and Critical Density Detection System for COVID-19.

Sign tracking predicts suboptimal behavior in a rodent gambling task.

Directing attention to event changes improves memory updating for older adults.

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

Application of Immersive Virtual Reality to Pragmatics Data Collection Methods

Exogenous Bimodal Cues Attenuate Age-Related Audiovisual Integration.

Keeping in time with social and non-social stimuli: Synchronisation with auditory, visual, and audio-visual cues

Individuals With Mild Cognitive Impairment and Alzheimer's Disease Benefit From Audiovisual Speech Cues and Supportive Sentence Context

Social facilitation for conservation planning: understanding fairy tern behavior and site selection in response to conspecific audio-visual cues

The Effectiveness of the Interaction of Interactive Book Cues and Levels of Information Processing on Learning Retention and External Cognitive Load

Audiovisual integration in macaque face patch neurons

Dopamine neurons gate the intersection of cocaine use, decision making, and impulsivity.

Recognition of valence using QRS complex in children with Autism Spectrum Disorder (ASD)

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Audiovisual Cues Research Articles

Related Topics

Articles published on Audiovisual Cues

Deep learning for depression recognition with audiovisual cues: A review

Exploring developmental changes in infant anticipation and perceptual processing: EEG responses to tactile stimulation.

Using Optogenetics to Reverse Neuroplasticity and Inhibit Cocaine Seeking in Rats

Using Optogenetics to Reverse Neuroplasticity and Inhibit Cocaine Seeking in Rats

Nicotine self-administration with menthol and audiovisual cue facilitates differential packaging of CYP2A6 and cytokines/chemokines in rat plasma extracellular vesicles

Multisensory-Guided Associative Learning Enhances Multisensory Representation in Primary Auditory Cortex.

Posttraumatic Stress Disorder Symptoms and Coping Motives are Independently Associated with Cannabis Craving Elicited by Trauma Cues.

A Vision-Based Social Distancing and Critical Density Detection System for COVID-19.

Sign tracking predicts suboptimal behavior in a rodent gambling task.

Directing attention to event changes improves memory updating for older adults.

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

Application of Immersive Virtual Reality to Pragmatics Data Collection Methods

Exogenous Bimodal Cues Attenuate Age-Related Audiovisual Integration.

Keeping in time with social and non-social stimuli: Synchronisation with auditory, visual, and audio-visual cues

Individuals With Mild Cognitive Impairment and Alzheimer's Disease Benefit From Audiovisual Speech Cues and Supportive Sentence Context

Social facilitation for conservation planning: understanding fairy tern behavior and site selection in response to conspecific audio-visual cues

The Effectiveness of the Interaction of Interactive Book Cues and Levels of Information Processing on Learning Retention and External Cognitive Load

Audiovisual integration in macaque face patch neurons

Dopamine neurons gate the intersection of cocaine use, decision making, and impulsivity.

Recognition of valence using QRS complex in children with Autism Spectrum Disorder (ASD)