Face Video Research Articles

Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer’s. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer’s and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.

Read full abstract

BackgroundMild cognitive impairment (MCI) is the transition stage between the cognitive decline expected in normal aging and more severe cognitive decline such as dementia. The early diagnosis of MCI plays an important role in human healthcare. Current methods of MCI detection include cognitive tests to screen for executive function impairments, possibly followed by neuroimaging tests. However, these methods are expensive and time-consuming. Several studies have demonstrated that MCI and dementia can be detected by machine learning technologies from different modality data. This study proposes a multi-stream convolutional neural network (MCNN) model to predict MCI from face videos.ResultsThe total effective data are 48 facial videos from 45 participants, including 35 videos from normal cognitive participants and 13 videos from MCI participants. The videos are divided into several segments. Then, the MCNN captures the latent facial spatial features and facial dynamic features of each segment and classifies the segment as MCI or normal. Finally, the aggregation stage produces the final detection results of the input video. We evaluate 27 MCNN model combinations including three ResNet architectures, three optimizers, and three activation functions. The experimental results showed that the ResNet-50 backbone with Swish activation function and Ranger optimizer produces the best results with an F1-score of 89% at the segment level. However, the ResNet-18 backbone with Swish and Ranger achieves the F1-score of 100% at the participant level.ConclusionsThis study presents an efficient new method for predicting MCI from facial videos. Studies have shown that MCI can be detected from facial videos, and facial data can be used as a biomarker for MCI. This approach is very promising for developing accurate models for screening MCI through facial data. It demonstrates that automated, non-invasive, and inexpensive MCI screening methods are feasible and do not require highly subjective paper-and-pencil questionnaires. Evaluation of 27 model combinations also found that ResNet-50 with Swish is more stable for different optimizers. Such results provide directions for hyperparameter tuning to further improve MCI predictions.

Read full abstract

Face Video Research Articles

Related Topics

Articles published on Face Video

Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank

MYFED: a dataset of affective face videos for investigation of emotional facial dynamics as a soft biometric for person identification

Frontal EEG correlation based human emotion identification and classification.

Heart Rate Estimation Algorithm Integrating Long and Short-Term Temporal Features

MDAR: A Multiscale Features-Based Network for Remotely Measuring Human Heart Rate Utilizing Dual-Branch Architecture and Alternating Frame Shifts in Facial Videos.

GCS-YOLOv8: A Lightweight Face Extractor to Assist Deepfake Detection.

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Multimodal Information Fusion and Data Generation for Evaluation of Second Language Emotional Expression

The return of the uncanny: artificial intelligence and estranged futures

Deep fake Video Face Recognition Using Supervised Contrastive Learning for Scalability and Interpretability

Respiratory Rate Estimation from Thermal Video Data Using Spatio-Temporal Deep Learning.

End-to-End Multimodal Emotion Recognition Based on Facial Expressions and Remote Photoplethysmography Signals.

Talking Face Generation With Audio-Deduced Emotional Landmarks.

A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision

Multimodal Fusion for Talking Face Generation Utilizing Speech-Related Facial Action Units

Mild cognitive impairment prediction based on multi-stream convolutional neural networks

Beyond the visible: thermal data for facial soft biometric estimation

Generalized face forgery detection with self-supervised face geometry information analysis network

Facial Video-Based Non-Contact Stress Recognition Utilizing Multi-Task Learning With Peak Attention.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Face Video Research Articles

Related Topics

Articles published on Face Video

Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank

MYFED: a dataset of affective face videos for investigation of emotional facial dynamics as a soft biometric for person identification

Frontal EEG correlation based human emotion identification and classification.

Heart Rate Estimation Algorithm Integrating Long and Short-Term Temporal Features

MDAR: A Multiscale Features-Based Network for Remotely Measuring Human Heart Rate Utilizing Dual-Branch Architecture and Alternating Frame Shifts in Facial Videos.

GCS-YOLOv8: A Lightweight Face Extractor to Assist Deepfake Detection.

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Multimodal Information Fusion and Data Generation for Evaluation of Second Language Emotional Expression

The return of the uncanny: artificial intelligence and estranged futures

Deep fake Video Face Recognition Using Supervised Contrastive Learning for Scalability and Interpretability

Respiratory Rate Estimation from Thermal Video Data Using Spatio-Temporal Deep Learning.

End-to-End Multimodal Emotion Recognition Based on Facial Expressions and Remote Photoplethysmography Signals.

Talking Face Generation With Audio-Deduced Emotional Landmarks.

A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision

Multimodal Fusion for Talking Face Generation Utilizing Speech-Related Facial Action Units

Mild cognitive impairment prediction based on multi-stream convolutional neural networks

Beyond the visible: thermal data for facial soft biometric estimation

Generalized face forgery detection with self-supervised face geometry information analysis network

Facial Video-Based Non-Contact Stress Recognition Utilizing Multi-Task Learning With Peak Attention.