Mel Spectrogram Research Articles

Patients with schizophrenia experience the most prolonged hospital stay in Japan. Also, the high re-hospitalization rate affects their quality of life (QoL). Despite being an effective predictor of treatment, QoL has not been widely utilized due to time constraints and lack of interest. As such, this study aimed to estimate the schizophrenic patients' subjective quality of life using speech features. Specifically, this study uses speech from patients with schizophrenia to estimate the subscale scores, which measure the subjective QoL of the patients. The objectives were to (1) estimate the subscale scores from different patients or cross-sectional measurements, and 2) estimate the subscale scores from the same patient in different periods or longitudinal measurements. A conversational agent was built to record the responses of 18 schizophrenic patients on the Japanese Schizophrenia Quality of Life Scale (JSQLS) with three subscales: "Psychosocial," "Motivation and Energy," and "Symptoms and Side-effects." These three subscales were used as objective variables. On the other hand, the speech features during measurement (Chromagram, Mel spectrogram, Mel-Frequency Cepstrum Coefficient) were used as explanatory variables. For the first objective, a trained model estimated the subscale scores for the 18 subjects using the Nested Cross-validation (CV) method. For the second objective, six of the 18 subjects were measured twice. Then, another trained model estimated the subscale scores for the second time using the 18 subjects' data as training data. Ten different machine learning algorithms were used in this study, and the errors of the learned models were compared. The results showed that the mean RMSE of the cross-sectional measurement was 13.433, with k-Nearest Neighbors as the best model. Meanwhile, the mean RMSE of the longitudinal measurement was 13.301, using Random Forest as the best. RMSE of less than 10 suggests that the estimated subscale scores using speech features were close to the actual JSQLS subscale scores. Ten out of 18 subjects were estimated with an RMSE of less than 10 for cross-sectional measurement. Meanwhile, five out of six had the same observation for longitudinal measurement. Future studies using a larger number of subjects and the development of more personalized models based on longitudinal measurements are needed to apply the results to telemedicine for continuous monitoring of QoL.

Read full abstract

Driver fatigue detection is one of the essential tools to reduce accidents and improve traffic safety. Its main challenge lies in the problem of how to identify the driver's fatigue state accurately. Existing detection methods include yawning and blinking based on facial expressions and physiological signals. Still, lighting and the environment affect the detection results based on facial expressions. In contrast, the electroencephalographic (EEG) signal is a physiological signal that directly responds to the human mental state, thus reducing the impact on the detection results. This paper proposes a log-Mel spectrogram and Convolution Recurrent Neural Network (CRNN) model based on EEG to implement driver fatigue detection. This structure allows the advantages of the different networks to be exploited to overcome the disadvantages of using them individually. The process is as follows: first, the original EEG signal is subjected to a one-dimensional convolution method to achieve a Short Time Fourier Transform (STFT) and passed through a Mel filter bank to obtain a logarithmic Mel spectrogram, and then the resulting logarithmic Mel spectrogram is fed into a fatigue detection model to complete the fatigue detection task for the EEG signals. The fatigue detection model consists of a 6-layer convolutional neural network (CNN), bi-directional recurrent neural networks (Bi-RNNs), and a classifier. In the modeling phase, spectrogram features are transported to the 6-layer CNN to automatically learn high-level features, thereby extracting temporal features in the bi-directional RNN to obtain spectrogram-temporal information. Finally, the alert or fatigue state is obtained by a classifier consisting of a fully connected layer, a ReLU activation function, and a softmax function. Experiments were conducted on publicly available datasets in this study. The results show that the method can accurately distinguish between alert and fatigue states with high stability. In addition, the performance of four existing methods was compared with the results of the proposed method, all of which showed that the proposed method could achieve the best results so far.

Read full abstract

Mel Spectrogram Research Articles

Related Topics

Articles published on Mel Spectrogram

Self-supervised learning–based underwater acoustical signal classification via mask modeling

Analysis of Application and Creation Skills of Story-Based MV Micro Video and Big Multimedia Data in Music Communication

Cognitive Load Assessment of Air Traffic Controller Based on SCNN-TransE Network Using Speech Data

Assamese Dialect Identification System using Convolution Neural Networks

FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks.

Feeding intensity assessment of aquaculture fish using Mel Spectrogram and deep learning algorithms

Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition

Voice pathology identification system using a deep learning approach based on unique feature selection sets

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.

CNN-Based Identification of Parkinson's Disease from Continuous Speech in Noisy Environments.

Speech Emotion Recognition

Comparative Study of Mfcc and Mel Spectrogram for Raga Classification Using CNN

Music sentiment classification based on an optimized CNN-RF-QPSO model

Estimation of subjective quality of life in schizophrenic patients using speech features.

EEG driving fatigue detection based on log-Mel spectrogram and convolutional recurrent neural networks.

Multi-modal approach for COVID-19 detection using coughs and self-reported symptoms

Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Influence of several audio parameters in urban sound event classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Mel Spectrogram Research Articles

Related Topics

Articles published on Mel Spectrogram

Self-supervised learning–based underwater acoustical signal classification via mask modeling

Analysis of Application and Creation Skills of Story-Based MV Micro Video and Big Multimedia Data in Music Communication

Cognitive Load Assessment of Air Traffic Controller Based on SCNN-TransE Network Using Speech Data

Assamese Dialect Identification System using Convolution Neural Networks

FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks.

Feeding intensity assessment of aquaculture fish using Mel Spectrogram and deep learning algorithms

Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition

Voice pathology identification system using a deep learning approach based on unique feature selection sets

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.

CNN-Based Identification of Parkinson's Disease from Continuous Speech in Noisy Environments.

Speech Emotion Recognition

Comparative Study of Mfcc and Mel Spectrogram for Raga Classification Using CNN

Music sentiment classification based on an optimized CNN-RF-QPSO model

Estimation of subjective quality of life in schizophrenic patients using speech features.

EEG driving fatigue detection based on log-Mel spectrogram and convolutional recurrent neural networks.

Multi-modal approach for COVID-19 detection using coughs and self-reported symptoms

Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Influence of several audio parameters in urban sound event classification