Abstract

Hearing disabilities affect people all over the world. Nearly 360 million people have a disabling hearing loss. In the recent years, there has been a rapid increase in the number of deaf and dumb victims due to birth defects, accidents and oral diseases. Since deaf and dumb people cannot communicate with a hearing person, they have to depend on some sort of visual communication. Sign Language is not understood by the majority of hearing people. The objective of our project is to form a bridge between the hearing-impaired community and the hearing community; to further commence a two-way communication. We propose a two-way sign language translator that converts Voice to sign language and vice versa in real time. We process the video frame by frame using the OpenCV Library in Python. Furthermore, we use background subtraction method called Gaussian Mixture-based Background/Foreground Segmentation Algorithm on each frame to subtract the background. Contours of this processed image are then passed into a Deep Neural Net (DNN) to classify the frame to the corresponding written language equivalent words. For conversion from sign language to voice, we use basic Natural Language Processing and gtts to accurately conserve the grammar in the sign language. The enrolment rate and literacy among the hearing-impaired children is far below the average for the population at large. This technology will then help the normal schools better integrate the hearing-impaired community thus making education more accessible and cheaper for them. Speech Emotion Recognition (SER) is a vital aspect of human-computer interaction and emotional intelligence applications. This SER that employs Mel- frequency cepstral coefficients (MFCC) as the basis for feature extraction, followed by a Convolutional Neural Network (CNN) with a customized architecture for emotion classification. The methodology 2 involves transforming audio samples into MFCC images, enabling the application of CNNs typically used in computer vision tasks. This fusion of traditional audio analysis with deep learning techniques harnesses the power of MFCC's spectral representation and CNN's spatial pattern recognition. By modifying the CNN architecture, the model can better discern emotional cues in the MFCC images, enhancing SER performance. This strategy offers SER a viable path forward, with the ability to identify complex emotional expressions. In order to increase the accuracy of emotion recognition, it tackles the requirement for feature-rich representations and sophisticated modelling techniques. KEYWORDS: OpenCV, DNN, SER, CNN, MFCC

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call