Gammatone Frequency Cepstral Coefficients Research Articles

The work presented in this paper aims at enhancing the performance of end-to-end (E2E) speech recognition task for children's speech under low resource conditions. For majority of the languages, there is hardly any speech data from child speakers. Furthermore, even the available children's speech corpora are limited in terms of the number of hours of data. On the other hand, large amounts of adults' speech data are freely available for research as well as commercial purposes. As a consequence, developing an effective E2E automatic speech recognition (ASR) system for children becomes a very challenging task. One may develop an ASR system using adults' speech and then use it to transcribe children's data, but this leads to very poor recognition rates due to the stark differences in the acoustic attributes of adults' and children's speech. In order to overcome these hurdles and to develop a robust children's ASR system employing E2E architecture, we have resorted to several out-of-domain and in-domain data augmentation techniques. For out-of-domain data augmentation, we have explicitly modified adults' speech to render it acoustically similar to that of children's speech before pooling into training. On the other hand, in the case of in-domain data augmentation, we have slightly modified the pitch and duration of children's speech in order to create more data capturing greater diversity. Data augmentation approaches helps in mitigating the ill-effects resulting from the scarcity of data from child domain to a certain extent. This, in turn, reduces the error rates by a large margin. In addition to data augmentation, we have also studied the efficacy of Gamma-tone frequency cepstral coefficients (GFCC) and frequency domain linear prediction (FDLP) technique along with the most commonly used Mel-frequency cepstral coefficients (MFCC) for front-end speech parameterization. Both MFCC as well as GFCC capture and model the spectral envelope of speech. On the other hand, application of linear prediction on the frequency domain representation of speech signal helps to effectively capture the temporal envelope during front-end feature extraction. Employing FDLP features that model the temporal envelope provides important cues for the perception and understanding of stop bursts and, at times, complete phonemes. This motivated us to perform a comparative experimental study of the effectiveness of the three aforementioned front-end acoustic features. In our experimental explorations, the use of proposed data augmentation in combination of FDLP features has shown a relative improvement in character error rate by 67.6% over the baseline system. The combination of data augmentation with MFCC or GFCC features is observed to result in lower recognition performances.

Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to communicate their needs and discomfort. In this paper, we propose a medical diagnostic system for interpreting infants' cry audio signals (CAS) using a combination of different audio domain features and deep learning (DL) algorithms. The proposed system utilizes a dataset of labeled audio signals from infants with specific pathologies. The dataset includes two infant pathologies with high mortality rates, neonatal respiratory distress syndrome (RDS), sepsis, and crying. The system employed the harmonic ratio (HR) as a prosodic feature, the Gammatone frequency cepstral coefficients (GFCCs) as a cepstral feature, and image-based features through the spectrogram which are extracted using a convolution neural network (CNN) pretrained model and fused with the other features to benefit multiple domains in improving the classification rate and the accuracy of the model. The different combination of the fused features is then fed into multiple machine learning algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) models. The evaluation of the system using the accuracy, precision, recall, F1-score, confusion matrix, and receiver operating characteristic (ROC) curve, showed promising results for the early diagnosis of medical conditions in infants based on the crying signals only, where the system achieved the highest accuracy of 97.50% using the combination of the spectrogram, HR, and GFCC through the deep learning process. The finding demonstrated the importance of fusing different audio features, especially the spectrogram, through the learning process rather than a simple concatenation and the use of deep learning algorithms in extracting sparsely represented features that can be used later on in the classification problem, which improves the separation between different infants' pathologies. The results outperformed the published benchmark paper by improving the classification problem to be multiclassification (RDS, sepsis, and healthy), investigating a new type of feature, which is the spectrogram, using a new feature fusion technique, which is fusion, through the learning process using the deep learning model.

Gammatone Frequency Cepstral Coefficients Research Articles

Related Topics

Articles published on Gammatone Frequency Cepstral Coefficients

Feature Extraction Methods for Underwater Acoustic Target Recognition of Divers.

Gammatone-Frequency Cepstral Coefficients Based Fear Emotion Level Recognition System

Cepstral coefficients effectiveness for gunshot classifying

Construction of multi-features comprehensive indicator for machinery health state assessment

Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques.

Comparative study of respiratory sounds classification methods based on cepstral analysis and artificial neural networks

Breathing site classification via joint mel frequency cepstral coefficients and gammatone frequency cepstral coefficients approach

Developing children's ASR system under low-resource conditions using end-to-end architecture

Underwater acoustic target recognition method based on WA-DS decision fusion

Unravelling stress levels in continuous speech through optimal feature selection and deep learning

Machine Learning-Based Classification of Pulmonary Diseases through Real-Time Lung Sounds

New research on monaural speech segregation based on quality assessment

Effective feature fusion via analysis of quantitative similarity matrices among various acoustic features for underwater active target detection

Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification

An Overview of Speech Enhancement Based on Deep Learning Techniques

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features.

Using multi-audio feature fusion for android malware detection

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Comparative analysis of various feature extraction techniques for classification of speech disfluencies

Using CCA-Fused Cepstral Features in a Deep Learning-Based Cry Diagnostic System for Detecting an Ensemble of Pathologies in Newborns

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gammatone Frequency Cepstral Coefficients Research Articles

Related Topics

Articles published on Gammatone Frequency Cepstral Coefficients

Feature Extraction Methods for Underwater Acoustic Target Recognition of Divers.

Gammatone-Frequency Cepstral Coefficients Based Fear Emotion Level Recognition System

Cepstral coefficients effectiveness for gunshot classifying

Construction of multi-features comprehensive indicator for machinery health state assessment

Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques.

Comparative study of respiratory sounds classification methods based on cepstral analysis and artificial neural networks

Breathing site classification via joint mel frequency cepstral coefficients and gammatone frequency cepstral coefficients approach

Developing children's ASR system under low-resource conditions using end-to-end architecture

Underwater acoustic target recognition method based on WA-DS decision fusion

Unravelling stress levels in continuous speech through optimal feature selection and deep learning

Machine Learning-Based Classification of Pulmonary Diseases through Real-Time Lung Sounds

New research on monaural speech segregation based on quality assessment

Effective feature fusion via analysis of quantitative similarity matrices among various acoustic features for underwater active target detection

Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification

An Overview of Speech Enhancement Based on Deep Learning Techniques

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features.

Using multi-audio feature fusion for android malware detection

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Comparative analysis of various feature extraction techniques for classification of speech disfluencies

Using CCA-Fused Cepstral Features in a Deep Learning-Based Cry Diagnostic System for Detecting an Ensemble of Pathologies in Newborns