A Facial-Expression-Aware Edge AI System for Driver Safety Monitoring
Road safety has emerged as a global issue, driven by the rapid rise in vehicle ownership and traffic congestion. Human error, like distraction, drowsiness, and panic, is the leading cause of road accidents. Conventional driver monitoring systems (DMSs) frequently fail to detect these emotional and cognitive states, limiting their potential to prevent accidents. To overcome these challenges, this work proposes a robust deep learning-based DMS framework capable of real-time detection and response to emotion-driven driver behaviors that pose safety risks. The proposed system employs convolutional neural networks (CNNs), specifically the Inception module and a Caffe-based ResNet-10 with a Single Shot Detector (SSD), to achieve efficient, accurate facial detection and classification. The DMS is trained on a comprehensive and diverse dataset from various public and private sources, ensuring robustness across a wide range of emotions and real-world driving scenarios. This approach enables the model to achieve an overall accuracy of 98.6%, an F1 score of 0.979, a precision of 0.980, and a recall of 0.979 across the four emotional states. Compared with existing techniques, the proposed model strikes an effective balance between computational efficiency and complexity, enabling the precise recognition of driving-relevant emotions, making it a practical and high-performing solution for real-world in-car driver monitoring systems.
- Conference Article
3
- 10.1109/icbats54253.2022.9759018
- Feb 16, 2022
The objective is to build an efficient face mask detector using Single Shot Detector (SSD). The algorithm used for face mask detection was a novel SSD and with the comparison of Convolutional Neural Network (CNN). The face mask detection dataset was usedand the ability of the algorithm was measured with the sample size of 136. SSD has achieved accuracy of 92.25% and for CNN it was 82.6%. By using a base architecture of VGG-16, SSD was able to outperform other object detectors like CNN without compromising speed and accuracy. The SSD and CNN are statistically satisfied with the independent sample t-test value (p<0.05) with a confidence level of 95%. Face mask detection using SSD was significantly better accurate than CNN.
- Research Article
1
- 10.55041/ijsrem30472
- Apr 9, 2024
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
This project explores advanced techniques in speech recognition, focusing on emotion identification using Convolutional Neural Networks for improved accuracy and real-time processing efficiency. Emotion recognition from speech signals plays a crucial role in various applications, including human-computer interaction, customer service, mental health monitoring, and entertainment. This project proposes an innovative approach to emotion recognition using Convolutional Neural Networks (CNNs) applied to speech data. By leveraging advanced deep learning techniques, the proposed system aims to accurately identify and classify emotions conveyed through vocal expressions. The project begins with a comprehensive review of existing literature on emotion recognition and speech processing, identifying key challenges and opportunities in the field. Building upon prior research, the project introduces a novel CNN architecture optimized for emotion recognition tasks. This architecture is designed to extract relevant features from speech signals and capture subtle nuances indicative of different emotional states. One of the distinguishing features of the proposed approach is its multi-modal integration, which combines information from both audio and visual modalities to enhance emotion recognition accuracy. In addition to analysing speech signals, the system incorporates visual cues such as facial expressions and gestures, providing a more comprehensive understanding of the speaker's emotional state. Real-time processing efficiency is prioritized in the design of the system, ensuring prompt and responsive emotion recognition in interactive applications. Optimization techniques such as model quantization and lightweight architecture design are employed to minimize computational overhead while maintaining high accuracy. To address the variability and subjectivity of emotional expression, the system incorporates user-specific adaptation mechanisms. Through continuous learning and feedback integration, the system dynamically adapts to individual speakers' speech patterns and emotional characteristics, enhancing its ability to accurately recognize emotions in diverse contexts. The project also explores ensemble learning strategies to improve robustness and generalization performance. By combining predictions from multiple CNN models trained on diverse datasets, the system achieves greater resilience to variations in emotional expression and environmental factors. Ethical considerations, including privacy protection and responsible data handling, are integral aspects of the project's design and implementation. Measures are implemented to ensure the ethical collection, storage, and usage of speech data, safeguarding user privacy and maintaining trust in the system. Overall, the proposed system represents a significant advancement in emotion recognition technology, offering a sophisticated and versatile solution for accurately identifying emotions from speech signals. By leveraging deep learning techniques, multi-modal integration, real-time processing optimization, user-specific adaptation, and ensemble learning, the system demonstrates promising potential for various practical applications requiring robust and context-aware emotion recognition capabilities. Keywords: Speech Recognition, Emotion Identification, Convolutional Neural Networks (CNNs), Real-time Processing, Multi-modal Integration, User-specific Adaptation, Ensemble Learning, Deep Learning, Emotional Expression, Ethical Data Handling
- Research Article
- 10.48175/ijarsct-5363
- Jun 29, 2022
- International Journal of Advanced Research in Science, Communication and Technology
Driver Assistance and Monitoring System plays a very important role in traffic management especially in Indian roads. It eventually reduces the accidents and major injuries. DAMS (Driver Assistance and Monitoring System) give the safety and driving comfort. The main motto of our work is to design the effective methodology for the assistance and driver monitoring system which alerts the driver when it detects the road signs so that driver can take the appropriate action. The proposed methodology detects a road signs which is present in the dataset under cluttered background and different lighting conditions. The proposed work detecting the road sign based on colour and shape. The edge of the road sign is detected using canny edge operator. The images are enhanced and removed the noise using median filters. The images are classified as stop, no entry, speed limit using Convolutional Neural Network (CNN) classifier.
- Conference Article
15
- 10.1109/ftc.2016.7821710
- Dec 1, 2016
These days, some robots have emotional state (expression and recognition) to make Human-Robot Interaction (HRI) and Robot-Robot Interaction (RRI) better. In this article we analyze what it means for a robot to have emotion and distinguishing emotional state for communication from an emotional state as a mechanism for the organization of its behavior with humans and robots by convolutional neural network (CNN). We discuss these relations and explain why it can be more effective by CNN for having better emotion in the robots. Here, we present a multimodal system for Emotions in Robots by CNN.
- Research Article
3
- 10.1002/cpe.6356
- May 7, 2021
- Concurrency and Computation: Practice and Experience
SummaryAlthough brain‐computer interfaces (BCI) progress rapidly, the desired success has not been achieved yet. One of these BCI is to detect emotional states in humans. An emotional state is a brain activity consisting of hormonal and mental reasons in the face of events. Emotions can be detected by electroencephalogram (EEG) signals due to these activities. Being able to detect the emotional state from EEG signals is important in terms of both time and cost. In this study, a method is proposed for the detection of the emotional state by using EEG signals. In the proposed method, we aim to classify EEG signals without any transform (Fourier transform, wavelet transform, etc.) or feature extraction method as a pre‐processing. For this purpose, convolutional neural networks (CNNs) are used as classifiers, together with SEED EEG dataset containing three different emotional (positive, negative, and neutral) states. The records used in the study were taken from 15 participants in three sessions. In the proposed method, raw channel‐time EEG recordings are converted into 28 × 28 size pattern segments without pre‐processing. The obtained patterns are then classified in the CNN. As a result of the classification, three emotion performance averages of all participants are found to be 88.84%. Based on the participants, the highest classification performance is 93.91%, while the lowest classification performance is 77.70%. Also, the average f‐score is found to be 0.88 for positive emotion, 0.87 for negative emotion, and 0.89 for neutral emotion. Likewise, the average kappa value is 0.82 for positive emotion, 0.81 for negative emotion, and 0.83 for neutral emotion. The results of the method proposed in the study are compared with the results of similar studies in the literature. We conclude that the proposed method has an acceptable level of performance.
- Research Article
- 10.48175/ijarsct-11409
- Jun 12, 2023
- International Journal of Advanced Research in Science, Communication and Technology
Music is an exemplary tool to judge a person’s emotional state. It is the language of the soul. What cannot be articulated through words are easily conveyed through a melody. Music not only speaks to a person’s emotional and mental state, but it is also known to have a therapeutic effect on the listener. The traditional method of music recommendation uses collaborative or content-based filtering to recommend songs but a person’s song choices does not depend only on the song they usually listen to but depends mostly on their emotional state. With the fast-paced innovations pertaining to the music application industry, there is still scope of further improvement in the user experience and creating an encompassing application that not only allows the app users to enjoy listening to their favorite songs but also caters to their recommendation based on their emotional state. Thus, an emotion detection system using Convolution Neural Networks has been proposed. The user feeds in a custom playlist containing a mixture of musical genres that are classified into different emotions using K-Means Clustering. The CNN model detects the emotional state of the user and recommends a series of songs from the classified playlist. This interactive interface is a revolutionary innovation for users who need song recommendations that suit their current mind-state.
- Research Article
- 10.3390/s25216755
- Nov 4, 2025
- Sensors
Drowsy driving is a major cause of traffic accidents worldwide, and its early detection remains essential for road safety. Conventional driver monitoring systems (DMS) primarily rely on behavioral indicators such as eye closure, gaze, or head pose, which typically appear only after a significant decline in alertness. This study explores the potential of facial near-infrared (NIR) imaging as a hypothetical physiological indicator of drowsiness. Because NIR light penetrates more deeply into biological tissue than visible light, it may capture subtle variations in blood flow and oxygenation near superficial vessels. Based on this hypothesis, we conducted a pilot feasibility study involving young adult participants to investigate whether drowsiness levels could be estimated from single-frame NIR facial images acquired at 940 nm—a wavelength already used in commercial DMS and suitable for both physiological sensitivity and practical feasibility. A convolutional neural network (CNN) was trained to classify multiple levels of drowsiness, and Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to interpret the discriminative regions. The results showed that classification based on 940 nm NIR images is feasible, achieving an optimal accuracy of approximately 90% under the binary classification scheme (Pattern A). Grad-CAM revealed that regions around the nasal dorsum contributed to this, consistent with known physiological signs of drowsiness. These findings support the feasibility of NIR-based drowsiness classification in young drivers and provide a foundation for future studies with larger and more diverse populations.
- Conference Article
1
- 10.1117/12.2667404
- Feb 22, 2023
With the rapid development of artificial intelligence technology, emotion recognition has been applied in all aspects of life, using eye movement tracking technology for emotion recognition has become an important branch of emotion computing. In order to explore the relationship between eye movement signals and learners' emotional states in the online video learning environment, we used machine learning and convolutional neural network methods to recognize eye movement signals, and classify learners' emotional states into two categories, positive and negative. The study of eye movement data under different time windows mainly includes four stages: data collection, data preprocessing, classifier modeling, training and testing. In this paper, a Eye-movement Feature Extraction Classification Network(EFECN) based on convolutional neural network is proposed for small sample data and the classification of emotion state based on eye movement. The eye movement data were transformed into images through cross-modal conversion as input of multiple different deep convolutional neural networks, and the emotional states were classified in positive and negative directions. The accuracy was used as the evaluation index to evaluate and compare the different models. The accuracy of the eye movement emotion recognition model reached 72% in the SVM model and 91.62% in the EFECN model. Experimental results show that the convolutional neural network based on deep learning has a significant improvement in recognition accuracy compared with traditional machine learning methods.
- Book Chapter
2
- 10.1007/978-981-13-9942-8_41
- Jan 1, 2019
In the domain of Intelligent Monitoring, Smart Driving and Robotics, Pedestrian intention detection is a prime discipline of object recognition. Currently, several pedestrian detection techniques are proposed however, just a handful are re-ported in the domain of pedestrian ‘intention’ detection. Due to the complications of the image background and pedestrian posture diversity, pedestrian intention detection is still a challenge which requires concise algorithms. In this paper, Single Shot Detector (SSD) is compared with Faster Region Convolutional Neural Network (Faster RCNN) architecture of deep neural network by applying different Convolutional Neural Network (CNN) models. Experiments have been conducted in a wide spectrum to obtain various models of Faster RCNN and SSD through compatible alterations in algorithm and parameters tuning. In this paper, Faster R-CNN and SSD architecture have been trained and their results are compared. New and simple evaluation performance parameters are suggested namely: Percentage Detection Index, Percentage Recognition Index and Precision score as compared to the traditional mean average precision (mAp) found in literature. While training these architectures with 1350 images, Faster RCNN learned three times faster than SSD with 2% increased accuracy.
- Conference Article
27
- 10.1109/icccnt45670.2019.8944491
- Jul 1, 2019
In the walk of Advanced Driving Assistance Systems (ADAS), Intelligent Driving and Traffic safety, Object detection plays a crucial role in the upcoming genesis of self-governing vehicles. Traditional computer vision and machine learning advances for object detection confront challenges against the difficult image backgrounds and environment conditions like sunlight effects, barricades and occlusions. In this paper, Single Shot Detector (SSD), Faster Region Convolutional Neural Network (Faster RCNN) and You Only Look Once (YOLOv2) deep learning architectures are compared by applying distinct pretrained Convolutional Neural Network (CNN) models. Experiments have been organized in a wide range to attain distinct models of Faster RCNN, SSD and YOLOv2 through appropriate modification in algorithms and parameters tuning. In this work, SSD, Faster RCNN and YOLOv2 are trained for 5 different object classes of traffic signs and their outcomes are evaluated. Traditional Evaluation parameters: mAp(mean Average precision-Precision, Recall and IoU) and FPS(Frames per second) are run-down to analyze the accuracy and speed of the algorithms. On analyzing, the accuracy of YOLOv2 outperforms Faster RCNN and SSD by 3.5% and 21% respectively. Also, YOLOv2 learned 3 times speedy than Faster RCNN with increased accuracy.
- Research Article
23
- 10.1109/taffc.2020.3023966
- Jan 1, 2023
- IEEE Transactions on Affective Computing
Players-based emotion recognition can help the understanding game players’ emotional states, contributing to the improvement of the game's quality and value. This article develops a hybrid neural network learning framework called convolutional smooth feedback fuzzy network (CSFFN) to detect a player's emotional states in real-time during a gaming process based on electroencephalogram (EEG) signals. Specifically, CSFFN rationally combines a convolutional neural network (CNN), a fuzzy neural network (FNN), and a recurrent neural network (RNN). CNN not only captures spatial characteristics between EEG signals from different channels but also eliminates noise from EEG signals, improving the accuracy and anti-noise performance in game emotion recognition. FNN extracts the membership degree of a player's different emotional states, further improving the emotion recognition accuracy. Since a player's current emotional state is influenced by the previous emotional states during the game process, RNN is employed to capture the temporal characteristics of EEG signals, better improving the emotion recognition accuracy. Experimental results show that CSFFN has higher recognition accuracy and noise resistance in identifying four emotional states (happiness, sadness, superiority, and anger) compared to support vector machine (SVM) with different kernels, linear discrimination analysis (LDA), AlexNet, and VGG16 methods.
- Research Article
70
- 10.1109/access.2019.2921027
- Jan 1, 2019
- IEEE Access
A major rise in the prevalence and influence of colorectal cancer (CRC) leads to substantially increasing healthcare costs and even death. It is widely accepted that early detection and removal of colonic polyps can prevent CRC. Detection of colonic polyps in colonoscopy videos is problematic because of complex environment of colon and various shapes of polyps. Currently, researchers indicate feasibility of Convolutional Neural Network (CNN)-based detection of polyps but better feature extractors are needed to improve detection performance. In this paper, we investigated the potential of the single shot detector (SSD) framework for detecting polyps in colonoscopy videos. SSD is a one-stage method, which uses a feed-forward CNN to produce a collection of fixed-size bounding boxes for each object from different feature maps. Three different feature extractors, including ResNet50, VGG16, and InceptionV3 were assessed. Multi-scale feature maps integrated into SSD were designed for ResNet50 and InceptionV3, respectively. We validated this method on the 2015 MICCAI polyp detection challenge datasets, compared it with teams attended the challenge, YOLOV3 and two-stage method, Faster-RCNN. Our results demonstrated that the proposed method surpassed all the teams in MICCAI challenge and YOLOV3 and was comparable with two-stage method. Especially in detection speed aspect, our proposed method outperformed all the methods, met real-time application requirement. Meanwhile, we also indicated that among all the feature extractors, InceptionV3 obtained the best result of precision and recall. In conclusion, SSD- based method achieved excellent detection performance in polyp detection and can potentially improve diagnostic accuracy and efficiency.
- Conference Article
8
- 10.1109/icbsii51839.2021.9445124
- Mar 25, 2021
In teleoperation mechanism, the surgical robots are controlled using hand gestures from remote location. The remote location robotic arm control using hand gesture recognition is a challenging computer vision problem. The hand action recognition under complex environment (cluttered background, lighting variation, scale variation etc.) is a difficult and time consuming process. In this paper, a light weight Convolutional Neural Network (CNN) model Single Shot Detector (SSD) Lite MobileNet-V2 is proposed for real-time hand gesture recognition. SSD Lite versions tend to run hand gesture recognition applications on low-power computing devices like Raspberry Pi due to its light weight and timely recognition. The model is deployed using a Camera and two Raspberry Pi Controllers For the hand gesture recognition and data transfer to the cloud server, the Raspberry Pi controller 1 is used. The Raspberry Pi Controller 2 receives the cloud information and controls the Robotic arm operations. The performance of the proposed model is also compared with a SSD Inception-V2 model for the MITI Hand dataset-II (MITI HD-II). The average precision, average recall and F1-score for SSD Lite MobileNet-V2 and SSD Inception-V2 models are analyzed by training and testing the model with the learning rate of 0.0002 using Adam optimizer. SSD MobileNet-V2 model obtained an Average precision of 98.74% and SSD Inception-V2 model as 99.27%, The prediction time for SSD Lite MobileNet-V2 model using Raspberry Pi controller takes only 0.67s whereas, 1.2s for SSD Inception-V2 Model.
- Supplementary Content
5
- 10.1155/2022/6238172
- Feb 15, 2022
- Journal of Healthcare Engineering
Emotion recognition means the automatic identification of a human's emotional state by obtaining his/her physiological or nonphysiological signals. The EEG-based method is an effective mechanism, which is commonly used for the recognition of emotions in real environments. In this paper, the convolutional neural network is used to classify the EEG signal into three and four emotional states under the DEAP dataset, which is defined as a Database for Emotion Analysis using physiological signals. For this purpose, a high-order cross-feature sample is extracted to recognize the emotional state with a single channel. A seven-layer convolutional neural network is used to classify the 32-channel EEG signal, and the average accuracy of four and three emotional states is 65% and 58.62%. The single-channel high-order cross-sample is classified with convolutional neural networks, and the average accuracy of four emotional states is 43.5%. Among all the channels related to emotion recognition, the F4 channel gets the best classification accuracy of 44.25%, and the average accuracy of the even number channel is higher than the odd number channel. The proposed method provides a basis for the real-time application of EEG-based emotion recognition.
- Research Article
- 10.24312/ucp-jeit.01.01.168
- Oct 30, 2023
- UCP Journal of Engineering & Information Technology
Our daily lives depend on vehicles for transportation, work, and adventure. Traffic accidents, a global issue, harm individuals, communities, and societies. Accidents often come from human errors, diversions, and judgment. This report underlines highway safety accidents' terrible consequences, from death to economic expenses. Many variables make accidents inevitable, but preventing them is crucial. Accidents are caused by human error, particularly distracted and inebriated driving. This study emphasizes driver status detecting systems' road safety benefits. These technologies evaluate drivers' safety via real-time monitoring of attentiveness, weariness, and impairment. Driver fatigue, alcohol or drug impairment, and cognitive distractions enhance accident risk. Eye tracking and facial recognition offer real-time solutions for these factors. Driver status detection devices warn drivers, preventing accidents. Tired or distracted driving can cause serious accidents. The adoption of driver status detection systems is vital. Prevention is possible with these tools that detect weariness and distraction. Addressing these concerns may reduce accidents and save lives. According to this study, the Vehicle Sensory Safety System actively analyzes drivers' physiological states using BPM and grip pressure. These dynamic data reveal the driver's emotional and physical state in real-time. The technology alerts drivers with a buzzer, vibrating steering wheel, and flashing LED strip. This project could improve road safety for all users and reduce highway accidents.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.