Mel Frequency Cepstral Coefficients Features Research Articles

Since pig vocalization is an important indicator of monitoring pig conditions, pig vocalization detection and recognition using deep learning play a crucial role in the management and welfare of modern pig livestock farming. However, collecting pig sound data for deep learning model training takes time and effort. Acknowledging the challenges of collecting pig sound data for model training, this study introduces a deep convolutional neural network (DCNN) architecture for pig vocalization and non-vocalization classification with a real pig farm dataset. Various audio feature extraction methods were evaluated individually to compare the performance differences, including Mel-frequency cepstral coefficients (MFCC), Mel-spectrogram, Chroma, and Tonnetz. This study proposes a novel feature extraction method called Mixed-MMCT to improve the classification accuracy by integrating MFCC, Mel-spectrogram, Chroma, and Tonnetz features. These feature extraction methods were applied to extract relevant features from the pig sound dataset for input into a deep learning network. For the experiment, three datasets were collected from three actual pig farms: Nias, Gimje, and Jeongeup. Each dataset consists of 4000 WAV files (2000 pig vocalization and 2000 pig non-vocalization) with a duration of three seconds. Various audio data augmentation techniques are utilized in the training set to improve the model performance and generalization, including pitch-shifting, time-shifting, time-stretching, and background-noising. In this study, the performance of the predictive deep learning model was assessed using the k-fold cross-validation (k = 5) technique on each dataset. By conducting rigorous experiments, Mixed-MMCT showed superior accuracy on Nias, Gimje, and Jeongeup, with rates of 99.50%, 99.56%, and 99.67%, respectively. Robustness experiments were performed to prove the effectiveness of the model by using two farm datasets as a training set and a farm as a testing set. The average performance of the Mixed-MMCT in terms of accuracy, precision, recall, and F1-score reached rates of 95.67%, 96.25%, 95.68%, and 95.96%, respectively. All results demonstrate that the proposed Mixed-MMCT feature extraction method outperforms other methods regarding pig vocalization and non-vocalization classification in real pig livestock farming.

Read full abstract

The lack of vocal emotional expression is a major deficit in social communication disorders. The current scenario of artificial intelligence focuses on collaborative training of deep learning models without losing data privacy. The primary objective of this paper is to propose a federated learning-based classification model to identify and analyze the emotional capabilities of individuals with vocal emotion deficits. The methodology has developed a collaborative and privacy-preserved approach using federated learning for training the deep learning models. The proposed methodology utilizes Mel-frequency Cepstral Coefficients (MFCC) to preprocess audio recordings. The four datasets (RAVDESS, CREMA, TESS, SAVEE) including emotion-based classified audio recordings were collected from open sources. The collected audio recordings are 3 s each and the total data set has 668376 audio files with happy - 175119 files, sad – 172611 files, angry – 176346 files, and normal - 144300 files. Further, the input audio was pre-processed to generate MFCC features. The study began with extracting features from multiple pre-trained DL models as its base model. Then, the performance of the federated learning (FL) model was tested on independent and identically distributed (IID) and non-IID data. Further, this paper presents a federated deep learning-based multimodal system for verbal communication emotions classification that uses audio datasets to meet data privacy requirements by DL on the FL ecosystem. As per the findings, the federated learning trained model provides nearly similar parametric results in comparison to base model training. For IID data, the model had 99.71 % validation accuracy, precision (99.73 %), recall (99.69 %), and validation loss (0.01). The FL architecture with non-IID data outperformed these measures with validation accuracy (99.97 %), precision (99.97 %), recall (99.97 %), and least loss (0). Hence the acquired results support the utilization of federated learning ecosystem-based trained models with identically and non-identically distributed audio features from emotion identification without losing parametric results. In conclusion, the proposed techniques could be applied to identify verbal emotional deficits in individuals and could support developing emerging technological interventions for their well-being.

Read full abstract

Mel Frequency Cepstral Coefficients Features Research Articles

Related Topics

Articles published on Mel Frequency Cepstral Coefficients Features

Emotion Recognition in Kurdish Speech from the Sorani Dialect Corpus

Sound Quality Prediction Method of Dual-Phase Hy-Vo Chain Transmission System Based on MFCC-CNN and Fuzzy Generation

A speech-based convolutional neural network for human body posture classification

Classification of Infant Crying Sounds Using SE-ResNet-Transformer

Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Audio Signal Recognition in Complex Environments Using Sparse Representation

A Multi-Feature Fusion Approach for Dialect Identification using 1D CNN

A multi-modal Parkinson’s disease diagnosis system from EEG signals and online handwritten tasks using grey wolf optimization based deep learning model

Application of Deep Learning for Voice Command Classification in Turkish Language

Feature analysis and recognition of fiber breakage AE signals after propagation

Implementation of Data Mining for Speech Recognition Classification of Sundanese Dialect Using KNN Method with MFCC Feature Extraction

DCNN for Pig Vocalization and Non-Vocalization Classification: Evaluate Model Robustness with New Data.

A lightweight and privacy preserved federated learning ecosystem for analyzing verbal communication emotions in identical and non-identical databases

Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers

Deepfake Audio Detection System

Responding to challenge call for machine learning model development in diagnosing respiratory disease sounds

A Study on Speech Recognition by a Neural Network Based on English Speech Feature Parameters

Research on sound quality of roller chain transmission system based on multi-source transfer learning

Automatically detecting OSAHS patients based on transfer learning and model fusion

Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Mel Frequency Cepstral Coefficients Features Research Articles

Related Topics

Articles published on Mel Frequency Cepstral Coefficients Features

Emotion Recognition in Kurdish Speech from the Sorani Dialect Corpus

Sound Quality Prediction Method of Dual-Phase Hy-Vo Chain Transmission System Based on MFCC-CNN and Fuzzy Generation

A speech-based convolutional neural network for human body posture classification

Classification of Infant Crying Sounds Using SE-ResNet-Transformer

Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Audio Signal Recognition in Complex Environments Using Sparse Representation

A Multi-Feature Fusion Approach for Dialect Identification using 1D CNN

A multi-modal Parkinson’s disease diagnosis system from EEG signals and online handwritten tasks using grey wolf optimization based deep learning model

Application of Deep Learning for Voice Command Classification in Turkish Language

Feature analysis and recognition of fiber breakage AE signals after propagation

Implementation of Data Mining for Speech Recognition Classification of Sundanese Dialect Using KNN Method with MFCC Feature Extraction

DCNN for Pig Vocalization and Non-Vocalization Classification: Evaluate Model Robustness with New Data.

A lightweight and privacy preserved federated learning ecosystem for analyzing verbal communication emotions in identical and non-identical databases

Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers

Deepfake Audio Detection System

Responding to challenge call for machine learning model development in diagnosing respiratory disease sounds

A Study on Speech Recognition by a Neural Network Based on English Speech Feature Parameters

Research on sound quality of roller chain transmission system based on multi-source transfer learning

Automatically detecting OSAHS patients based on transfer learning and model fusion

Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset