Advanced AI techniques for video classification: a comprehensive framework using multiple feature extraction and classification methods

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract The increasing popularity of multimedia applications, such as video classification, has underscored the need for efficient methods to manage and categorize vast video datasets. Video classification simplifies video categorization, enhancing searchability and retrieval by leveraging distinctive features extracted from textual, audio, and visual components. This paper introduces an automated video recognition system that classifies video content based on motion types (low, medium, and high) derived from visual component characteristics. The proposed system utilizes advanced artificial intelligence techniques with four feature extraction methods; MFCC alone, (2) MFCC after applying DWT, (3) denoised MFCC, and (4) MFCC after applying denoised DWT. And seven classification algorithms to optimize accuracy. A novel aspect of this study is the application of Mel Frequency Cepstral Coefficients (MFCC) to extract features from the video domain rather than their traditional use in audio processing, demonstrating the effectiveness of MFCC for video classification. Seven classification techniques, including K-Nearest Neighbors (KNN), Radial Basis Function Support Vector Machines (SVM-RBF), Parzen Window Method, Neighborhood Components Analysis (NCA), Multinomial Logistic Regression (ML), Linear Support Vector Machines (SVM Linear), and Decision Trees (DT), are evaluated to establish a robust classification framework. Experimental results indicate that this denoising-enhanced system significantly improves classification accuracy, providing a comprehensive framework for future applications in multimedia management and other fields.

Similar Papers
  • Conference Article
  • Cite Count Icon 4
  • 10.1117/12.2073614
Automatic detection of wheezes by evaluation of multiple acoustic feature extraction methods and C-weighted SVM
  • Jan 28, 2015
  • Fabio A González + 2 more

This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.

  • Research Article
  • Cite Count Icon 44
  • 10.1109/jsen.2019.2927754
Condition Monitoring of Machines Using Fused Features From EMD-Based Local Energy With DNN
  • Aug 1, 2020
  • IEEE Sensors Journal
  • Seetaram Maurya + 2 more

Several data-driven methods such as signal processing and machine learning exist separately to analyze non-linear and non-stationary data but their performance degrades due to insufficient information in the real-time application. In order to improve the performance, this paper proposes a novel feature extraction method using fusion of hand-crafted (low-level) features and high-level features, followed by feature extraction/selection on fused features. Local energy-based hand-crafted features have been derived from empirical mode decomposition, and high-level features have been extracted from the deep neural network. A method is also proposed for reduction of massive data points in the samples. The proposed scheme has studied the effect of variation in the number of extracted/selected features. The effectiveness of the proposed scheme is validated through three case studies: a) on acoustic dataset collected from the reciprocating type air compressor, b) on vibration dataset collected from deep groove ball bearing, and c) on steel plate faults dataset. The classification accuracy on acoustic dataset are obtained as high as 100.0%, 99.78%, and 99.78% using the random forest, linear support vector machine, and radial basis function support vector machine, respectively, with 5-fold cross-validation. Similarly, on vibration dataset obtained accuracies are 100.0%. The proposed scheme has been compared with ten conventional methods on five-fold cross-validation. These experimental results show considerable improvement in the prediction performance of machine conditions using the proposed scheme.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.eswa.2017.05.017
Multiple-rank supervised canonical correlation analysis for feature extraction, fusion and recognition
  • May 8, 2017
  • Expert Systems with Applications
  • Xizhan Gao + 2 more

Multiple-rank supervised canonical correlation analysis for feature extraction, fusion and recognition

  • Conference Article
  • Cite Count Icon 234
  • 10.1109/icspcs.2010.5709752
A novel approach for MFCC feature extraction
  • Dec 1, 2010
  • Md Afzal Hossan + 2 more

The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction and current research aims to identify performance enhancements. One of the recent MFCC implementations is the Delta-Delta MFCC, which improves speaker verification. In this paper, a new MFCC feature extraction method based on distributed Discrete Cosine Transform (DCT-II) is presented. Speaker verification tests are proposed based on three different feature extraction methods including: conventional MFCC, Delta-Delta MFCC and distributed DCT-II based Delta-Delta MFCC with a Gaussian Mixture Model (GMM) classifier.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 125
  • 10.1155/2018/1214301
Deep Learning Methods for Underwater Target Feature Extraction and Recognition.
  • Jan 1, 2018
  • Computational Intelligence and Neuroscience
  • Gang Hu + 5 more

The classification and recognition technology of underwater acoustic signal were always an important research content in the field of underwater acoustic signal processing. Currently, wavelet transform, Hilbert-Huang transform, and Mel frequency cepstral coefficients are used as a method of underwater acoustic signal feature extraction. In this paper, a method for feature extraction and identification of underwater noise data based on CNN and ELM is proposed. An automatic feature extraction method of underwater acoustic signals is proposed using depth convolution network. An underwater target recognition classifier is based on extreme learning machine. Although convolution neural networks can execute both feature extraction and classification, their function mainly relies on a full connection layer, which is trained by gradient descent-based; the generalization ability is limited and suboptimal, so an extreme learning machine (ELM) was used in classification stage. Firstly, CNN learns deep and robust features, followed by the removing of the fully connected layers. Then ELM fed with the CNN features is used as the classifier to conduct an excellent classification. Experiments on the actual data set of civil ships obtained 93.04% recognition rate; compared to the traditional Mel frequency cepstral coefficients and Hilbert-Huang feature, recognition rate greatly improved.

  • Research Article
  • Cite Count Icon 155
  • 10.1016/j.eswa.2017.08.015
Speaker identification features extraction methods: A systematic review
  • Aug 16, 2017
  • Expert Systems with Applications
  • Sreenivas Sremath Tirumala + 3 more

Speaker identification features extraction methods: A systematic review

  • Research Article
  • Cite Count Icon 25
  • 10.1007/s11277-021-08181-0
Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language
  • Feb 13, 2021
  • Wireless Personal Communications
  • Shobha Bhatt + 2 more

The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.

  • Conference Article
  • Cite Count Icon 3
  • 10.1117/12.2060517
Research on the feature extraction and pattern recognition of the distributed optical fiber sensing signal
  • Sep 12, 2014
  • Shaohua Pi + 3 more

In this paper, feature extraction and pattern recognition of the distributed optical fiber sensing signal have been studied. We adopt Mel-Frequency Cepstral Coefficient (MFCC) feature extraction, wavelet packet energy feature extraction and wavelet packet Shannon entropy feature extraction methods to obtain sensing signals (such as speak, wind, thunder and rain signals, etc.) characteristic vectors respectively, and then perform pattern recognition via RBF neural network. Performances of these three feature extraction methods are compared according to the results. We choose MFCC characteristic vector to be 12-dimensional. For wavelet packet feature extraction, signals are decomposed into six layers by Daubechies wavelet packet transform, in which 64 frequency constituents as characteristic vector are respectively extracted. In the process of pattern recognition, the value of diffusion coefficient is introduced to increase the recognition accuracy, while keeping the samples for testing algorithm the same. Recognition results show that wavelet packet Shannon entropy feature extraction method yields the best recognition accuracy which is up to 97%; the performance of 12-dimensional MFCC feature extraction method is less satisfactory; the performance of wavelet packet energy feature extraction method is the worst.

  • Conference Article
  • 10.1109/tencon.2017.8228150
Non-linear filtering for feature enhancement of reverberant speech
  • Nov 1, 2017
  • Amit Kumar Verma + 3 more

Speaker identification implemented on a mobile robot is a challenging problem because of varying reverberant environments which the robot encounters while in motion. The performance of a typical speaker identification system degrades significantly in reverberant environments. The degradation in performance is mainly due to the conventional feature being not robust to change in reverberant condition. In this paper, we present a non-linear filter based mel frequency cepstral coefficient (MFCC) feature extraction, which is more robust to changes in reverberant conditions. This feature extraction method is a two stage operation and is applied on the spectrogram of the speech signal. The first stage suppresses the frequency spread due to reverberation within each frame and in the second stage, reverberation effect across the frames is suppressed. The performance is evaluated by the GMM-UBM based identifier built and tested with conventional MFCC feature vectors and with the non-linear filter based MFCC feature vectors. We show that, the identification accuracy of GMM-UBM based identifier with non-linear filter based MFCC feature vectors is better than that of conventional MFCC feature vectors.

  • Conference Article
  • Cite Count Icon 70
  • 10.1109/icoiact.2018.8350748
Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition
  • Mar 1, 2018
  • Anggun Winursito + 2 more

In the pattern recognition system, there are many methods used. For speech recognition system, Mel Frequency Cepstral Coefficients (MFCC) becomes a popular feature extraction method but it has various weaknesses especially about the accuracy level and the high of result feature dimension of the extraction method. This paper presents the combination of MFCC feature extraction method with Principal Component Analysis (PCA) to improve the accuracy in Indonesian speech recognition system. By combining MFCC and PCA, it was expected to increase the accuracy system and reduce the feature data dimension. The result of MFCC data features extraction added with delta coefficients formed matrix data that later would be reduced using PCA. PCA method in the process of data reduction was designed to be two versions. Then the result of PCA reduction data was processed to the classification process using K-Nearest Neighbour (KNN) method. Composing the data was formed from 140 speech data that were recorded from 28 speakers. The research findings showed that adding PCA method version 1 could reduce the feature dimension from 26 to 12 by the same accuracy of speech recognition with the conventional MFCC method without PCA, that is 86.43%. Whereas PCA method version 2 could increase the accuracy of speech recognition from the conventional MFCC method without PCA in increasing from 86.43% to 89.29% and decreasing of the data dimension from 26 to 10 feature dimensions.

  • Research Article
  • Cite Count Icon 75
  • 10.32604/cmc.2022.023278
Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning
  • Jan 1, 2022
  • Computers, Materials & Continua
  • Uğur Ayvaz + 5 more

Automatic speaker recognition (ASR) systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals. One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients (MFCCs). Recent researches show that MFCCs are successful in processing the voice signal with high accuracies. MFCCs represents a sequence of voice signal-specific features. This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings. Since the human perception of sound is not linear, after the filterbank step in the MFCC method, we converted the obtained log filterbanks into decibel (dB) features-based spectrograms without applying the Discrete Cosine Transform (DCT). A new dataset was created with converted spectrogram into a 2-D array. Several learning algorithms were implemented with a 10-fold cross-validation method to detect the speaker. The highest accuracy of 90.2% was achieved using Multi-layer Perceptron (MLP) with tanh activation function. The most important output of this study is the inclusion of human voice as a new feature set.

  • Research Article
  • Cite Count Icon 11
  • 10.1515/jisys-2018-0057
Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  • Oct 1, 2018
  • Journal of Intelligent Systems
  • Mohit Dua + 2 more

An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s10772-012-9166-0
Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
  • Jun 28, 2012
  • International Journal of Speech Technology
  • M Afzal Hossan + 1 more

In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.

  • Research Article
  • Cite Count Icon 1
  • 10.3389/fcvm.2024.1425275
Development and evaluation of a machine learning model for post-surgical acute kidney injury in active infective endocarditis.
  • Dec 5, 2024
  • Frontiers in cardiovascular medicine
  • Xinpei Liu + 4 more

Acute kidney injury (AKI) is notably prevalent after cardiac surgery for patients with active infective endocarditis. This study aims to create a machine learning model to predict AKI in this high-risk group, improving upon existing models by focusing specifically on endocarditis-related surgeries. We analyzed medical records from 527 patients who underwent cardiac surgery for active infective endocarditis from January 2012 to December 2023. Feature selection was performed using LASSO regression. These features informed the development of machine learning models, including logistic regression, linear and radial basis function support vector machines, XGBoost, decision trees, and random forests. The optimal model was selected based on ROC curve AUC. Model performance was assessed through discrimination, calibration, and clinical utility, with explanations provided by SHAP values. Post-surgical AKI was observed in 261 patients (49.53%). LASSO regression identified 25 significant features for the models. Among the six algorithms tested, the radial basis function support vector machine (RBF-SVM) had the highest AUC at 0.771. The 15 most critical features were valve replacement, pre-operative hypertension, large vegetations, NYHA class, alcoholism, age, post-operative low cardiac output syndrome, TyG index, pre-operative creatinine clearance, cardiopulmonary bypass duration, intra-operative red blood cell transfusion, intra-operative urine output, pre-operative hemoglobin levels, and timing of surgery. Compared to standard cardiac surgery, AKI occurs more frequently and with a more complex etiology in surgeries for active infective endocarditis. Machine learning models enable early prediction of post-surgical AKI, facilitating targeted perioperative optimization and risk stratification in this distinct patient group.

  • Research Article
  • Cite Count Icon 36
  • 10.1007/s11277-019-06373-3
Feature Extraction Methods in Language Identification: A Survey
  • Apr 22, 2019
  • Wireless Personal Communications
  • Deepti Deshwal + 2 more

Language Identification (LI) is one of the widely emerging field in the areas of speech processing to accurately identify the language from the data base based on some features of the speech signal. LI technologies have a wide set of applications in different spheres due to the growing advancement in the field of artificial intelligence and machine learning. Feature extraction is one of the fundamental and significant process performed in LI. This review presents main paradigms of research in Feature Extraction methods that will provide a deep insight to the researchers about the feature extraction techniques for future studies in LI. Broadly, this review summarizes and compare various feature extraction approaches with and without noise compensation techniques as the current trend is towards robust universal Language Identification framework. This paper categorizes the different feature extraction approaches on the basis of different features, human speech production system/peripheral auditory system, spectral or cepstral analysis, and lastly on the basis of transform. Moreover, the different noise compensation-based feature extraction techniques are also covered in the review. This paper also presents, that Mel-Frequency Cepstral Coefficients (MFCCs) are the most popular approach. Results indicates that MFCC fused with other feature vectors and cleansing approaches gives improved performance as compared to the pure MFCC based Feature Extraction approaches. This study also describes the different categories at the front end of the LI system from research point of view.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.