Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A speech signal representation technique based on empirical mode decomposition

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

A speech signal representation technique based on empirical mode decomposition

Similar Papers
  • Research Article
  • 10.1142/s179353691350009x
ANALYSIS OF SIGNAL TREND BY A PSEUDO-EMD METHOD WITH APPLICATIONS TO WEATHER AND SPEECH DATA
  • Apr 1, 2013
  • Advances in Adaptive Data Analysis
  • Sharif M A Bhuiyan + 2 more

Empirical mode decomposition (EMD) has been established as a valuable tool in determining nonlinear signal trend. EMD decomposes a one-dimensional (1D) signal into hierarchical components known as intrinsic mode functions (IMFs) and a residue, based on the local properties of the signal. The first IMF depicts the highest local oscillations, while the residue depicts the trend of a signal/data. In each iteration of the EMD process, interpolation is applied to some local maxima and minima points to form upper and lower envelopes, respectively. But, the application of interpolation methods causes huge computation time and other artifacts in the decomposition, which limits the use of EMD for many real life signals. This paper proposes an effective method that replaces the interpolation step by direct envelope estimation using order statistics filters, which results in decreased computation time, following a similar EMD approach that has been recently proposed for two-dimensional data or image analysis. The modified EMD of this paper called pseudo EMD (P-EMD) method is particularly useful in determining, analyzing, and/or modifying the trend of various signals to obtain and/or produce some desired results/outcomes. Several synthetic and real-life signals such as speech signal and sea level pressure and temperature are tested to verify the effectiveness of the P-EMD. From the results, P-EMD has been found as a superior alternative for trend analysis of signal/data, since it results in more accurate trend compared to the other interpolation based EMD methods such as classical EMD (CEMD) and a modified EMD (MEMD), and also facilitates faster computation.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icalip.2014.7009748
An intrinsic mode function basis dictionary for auditory signal processing
  • Jul 1, 2014
  • Chang Gao + 2 more

As one important field of sparse representation, the research of dictionary learning attracts most researchers interest in signal processing study. Empirical Mode Decomposition (EMD), as an efficient and adaptive signal decomposition method that depends completely on the signal, is considered as an innovative and appropriative the basis function theory. The Intrinsic Mode Functions (IMFs) obtained by EMD are used as the basis of that expansion which can be linear or nonlinear as dictated by the data, and their linear combination is an efficient representation of the original signals. However, IMFs cannot directly engage in the sparse representation of signals, and their application to the auditory signal processing is quite limited. In this paper, we propose a universal algorithm for dictionary learning that transforms raw IMFs into valuable basis functions. The signals are decomposed into IMFs by EMD, then the general dictionary learning algorithm is implemented on these IMFs, finally, the IMF basis dictionary is learned. Experiments of sparse representation and reconstruction of speech signals are carried out to verify the effectiveness and efficiency of the proposed IMF basis dictionary. The results proved that the signal-to-noise ratio between the reconstructed speech signal and the original one is much higher comparing with other traditional dictionaries, and a better sparseness is achieved.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.dsp.2016.07.012
A better decomposition of speech obtained using modified Empirical Mode Decomposition
  • Jul 22, 2016
  • Digital Signal Processing
  • Rajib Sharma + 1 more

A better decomposition of speech obtained using modified Empirical Mode Decomposition

  • Research Article
  • Cite Count Icon 2
  • 10.3390/a18010025
Hybrid Empirical and Variational Mode Decomposition of Vibratory Signals
  • Jan 5, 2025
  • Algorithms
  • Eduardo Esquivel-Cruz + 5 more

Signal analysis is a fundamental field in engineering and data science, focused on the study of signal representation, transformation, and manipulation. The accurate estimation of harmonic vibration components and their associated parameters in vibrating mechanical systems presents significant challenges in the presence of very similar frequencies and mode mixing. In this context, a hybrid strategy to estimate harmonic vibration modes in weakly damped, multi-degree-of-freedom vibrating mechanical systems by combining Empirical Mode Decomposition and Variational Mode Decomposition is described. In this way, this hybrid approach leverages the detection of mode mixing based on the analysis of intrinsic mode functions through Empirical Mode Decomposition to determine the number of components to be estimated and thus provide greater information for Variational Mode Decomposition. The computational time and dependency on a predefined number of modes are significantly reduced by providing crucial information about the approximate number of vibratory components, enabling a more precise estimation with Variational Mode Decomposition. This hybrid strategy is employed to compute unknown natural frequencies of vibrating systems using output measurement signals. The algorithm for this hybrid strategy is presented, along with a comparison to conventional techniques such as Empirical Mode Decomposition, Variational Mode Decomposition, and the Fast Fourier Transform. Through several case studies involving multi-degree-of-freedom vibrating systems, the superior and satisfactory performance of the hybrid method is demonstrated. Additionally, the advantages of the hybrid approach in terms of computational efficiency and accuracy in signal decomposition are highlighted.

  • Research Article
  • Cite Count Icon 4
  • 10.1515/jisys-2013-0089
Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions
  • Apr 2, 2014
  • Journal of Intelligent Systems
  • M.S Rudramurthy + 3 more

Speaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonlinear and nonstationary, and therefore, difficult to analyze under realistic conditions. Also, in real conditions, the nature of the noise present in speech data is not known a priori. In such cases, the performance of speaker identification (SI) or speaker verification (SV) degrades considerably under realistic conditions. Any SR system uses a voice activity detector (VAD) as the front-end subsystem of the whole system. The performance of most VADs deteriorates at the front end of the SR task or system under degraded conditions or in realistic conditions where noise plays a major role. Recently, speech data analysis and processing using Norden E. Huang’s empirical mode decomposition (EMD) combined with Hilbert transform, commonly referred to as Hilbert–Huang transform (HHT), has become an emerging trend. EMD is an a posteriori, adaptive, data analysis tool used in time domain that is widely accepted by the research community. Recently, speech data analysis and speech data processing for speech recognition and SR tasks using EMD have been increasing. EMD-based VAD has become an important adaptive subsystem of the SR system that mostly mitigates the effect of mismatch between the training and the testing phase. Recently, we have developed a VAD algorithm using a zero-frequency filter-assisted peaking resonator (ZFFPR) and EMD. In this article, the efficacy of an EMD-based VAD algorithm is studied at the front end of a text-independent language-independent SI task for the speaker’s data collected in three languages at five different places, such as home, street, laboratory, college campus, and restaurant, under realistic conditions using EDIROL-R09 HR, a 24-bit wav/MP3 recorder. The performance of this proposed SI task is compared against the traditional energy-based VAD in terms of percentage identification rate. In both cases, widely accepted Mel frequency cepstral coefficients are computed by employing frame processing (20-ms frame size and 10-ms frame shift) from the extracted voiced speech regions using the respective VAD techniques from the realistic speech utterances, and are used as a feature vector for speaker modeling using popular Gaussian mixture models. The experimental results showed that the proposed SI task with the VAD algorithm using ZFFPR and EMD at its front end performs better than the SI task with short-term energy-based VAD when used at its front end, and is somewhat encouraging.

  • Conference Article
  • Cite Count Icon 6
  • 10.21437/interspeech.2008-627
Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model
  • Sep 22, 2008
  • Md Khademul Islam Molla + 2 more

This paper presents a method of voiced/unvoiced (V/Uv) classification of noisy speech signals. Empirical mode decomposition (EMD), a newly developed tool to analyze nonlinear and non-stationary signals is used to filter the additive noise with the speech signal. The normalized autocorrelation of the filtered speech signal is computed to enhance the periodicity if any. It is considered that the voiced speech signal is periodically correlated and the unvoiced signal is not. A statistical model of determining periodic correlation is used to differentiate voiced and unvoiced speech with low SNR. The experimental results show that the use of EMD improves the classification performance and the overall efficiency is noticeable as compared to other existing algorithms. Index Terms : empirical mode decomposition, normalized autocorrelation, periodic correlation, voiced/unvoiced speech 1. Introduction Reliable classification of short time speech signal into voiced and unvoiced is a crucial preprocessing step in many speech processing applications and is essential in most analysis and synthesis system. For example: different strategy could be adopted for voiced and unvoiced parts in speech enhancement using spectral subtraction. The essence of classification is to determine whether the speech production system involves the vibration of the vocal cords [1]. The discrimination problem is an important one and has been worked on extensively during the last three decades [2]. The discrimination can effectively be performed using a single feature or parameter which is closely associated with the voicing and non-voicing activities of speech signal. Many algorithms have been reported for solving the detection problem [3] – [7]. In [3], Gaussian mixture model with cepstrum coefficients features is proposed for robust V/Uv classification. A higher order statistics (HOS) based method is proposed in [4] for V/Uv detection and pitch estimation simultaneously. The matching pursuit algorithm is used in [5] with Gabor decomposition. The wavelet transform is proposed in pitch and V/Uv detection in [6]. A statistical model applied in autocorrelation domain is also reported in [7]. In most of the existing algorithms are not so much noise robust and also the intensive threshold and training data are required for classification. Such requirements are troublesome for the use in application domain. The proposed method is noise robust and based on the statistical model for periodicity detection in speech signal without any training requirement. To reduce the effect of noise on speech signal, a data adaptive time domain filtering is proposed using newly developed empirical mode decomposition method [8]. Although speech signal is non-stationary in nature, Fourier based frequency domain filtering assumes that it is piecewise stationary. The speech decomposition is performed by fitting some predefined bases without satisfying its non-stationary nature. Whereas, EMD based approach decompose the speech signal as non-stationary time series and hence better performance in noise filtering. A method for determining whether an observed time series contains a periodically correlated sequence is employed here. It is based on the statistical tests for the coherence between spectral components for the presence of a periodically correlated covariance structure in a time series [9]. The autocorrelation function (ACF) makes the periodicity more prominent if any. The proposed periodic correlation model is applied in the autocorrelation domain rather than original time domain of the speech signal. The periodicity detection method is implemented in spectral domain to classify the speech segment into voiced or unvoiced one based on that it contains periodic correlated sequence or not respectively.

  • Conference Article
  • 10.2991/isci-15.2015.69
A Novel Overcomplete Dictionary Training Based on Empirical Mode Decomposition and Its Performance Analysis
  • Jan 1, 2015
  • Shikui Wang + 1 more

In this paper, a novel overcomplete dictionary training method which is based on empirical mode decomposition is presented. The IMFs by empirical mode decomposition take part in the training of overcomplete dictionary, and K-SVD algorithm is adopted in the training process. Simulation results show that, compared with the dictionary trained directly from the original speech signals, the overcomplete dictionary has sparser representation for the speech signals, and thus has higher reconstructed speech quality.

  • Conference Article
  • Cite Count Icon 17
  • 10.1109/embc.2012.6346311
Empirical mode decomposition as a tool to remove the function Electrical stimulation artifact from surface electromyograms: Preliminary investigation
  • Aug 1, 2012
  • R B Pilkar + 2 more

Rectification of surface EMGs during electrical stimulations (ES) is still a problem to be solved. The broad band frequency components of ES artifact overlap with the EMG spectrum, make this task challenging. In this study, we investigate the potential use of empirical mode decomposition (EMD) method to remove the stimulus artifact from surface EMGs collected during such applications. We hypothesize that the EMD algorithm provides a suitable platform for decomposing the EMG signal into physically meaningful intrinsic modes which can be used to isolate ES artifact. Basic EMD is tested on two signals - ES induced EMG and EMG of voluntary contractions added with simulated ES signal. The algorithm isolates the EMG from ES artifact with considerable success. Further, the EMD method along with the energy operator -TKEO gives even better representation of the EMG signal. However, some high frequency data was lost during reconstruction process. Hence, there is further need to investigate the relationship between the EMD parameters and stimulus artifact properties so that the algorithm can be optimized to reconstruct pure artifact free EMG signal with minimum lost of data.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/cmsp.2011.149
A Speech Denoising Method Based on Improved EMD
  • May 1, 2011
  • Zhang Jun-Chang + 1 more

Speech signal is corrupted inevitably by noise which results in speech distortion during generation, transmission and reception process. In this paper, empirical mode decomposition (EMD) for non-stationary and nonlinear signal analysis is applied to speech de-noising. Moreover, focusing on the problems of envelopes fitting and interpolation points selection in conventional EMD, an improved EMD is proposed, which uses cubic hermite interpolation instead of cubic spline for signal envelopes fitting, and doubly-iterative sifting method instead of local extrema for interpolation points selection. Thus, the errors of algorithm could be reduced, and overshoots or undershoots be avoided. Simulation shows that the proposed method decreases speech distortion and increases output signal to noise ratio (SNR), compared with speech denoising based on wavelet and conventional EMD.

  • Research Article
  • 10.1016/j.apacoust.2024.110210
Plug-and-play learnable EMD module for time-domain speech separation
  • Aug 8, 2024
  • Applied Acoustics
  • Xiaokang Yang + 2 more

Plug-and-play learnable EMD module for time-domain speech separation

  • Research Article
  • 10.24294/jche.v0i0.612
Robust speech recognition on an FPGA chip with dual core
  • Jun 10, 2018
  • Shing‐Tai Pan + 1 more

The purpose of this paper is to enhance the computing speed of Empirical Mode Decomposition (EMD) based on multi-core embedded systems for robust speech recognition. The EMD is used to discompose some noised speech signals into several Intrinsic Mode Functions (IMFs). These IMFs will be combined to recover the original speech by multiplying their corresponding weights which were trained by Genetic Algorithms (GA). After applying Empirical Mode Decomposition (EMD), we obtain a cleaner speech for recognition. Since the EMD will take much computation time, a parallel computation algorithm under multi-core embedded structure is proposed to reduce the computation time of EMD. This will enhance the efficiency of speech recognition.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.3389/fpsyg.2022.1075624
Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.
  • Jan 9, 2023
  • Frontiers in Psychology
  • Congshan Sun + 2 more

Speech emotion recognition (SER) is the key to human-computer emotion interaction. However, the nonlinear characteristics of speech emotion are variable, complex, and subtly changing. Therefore, accurate recognition of emotions from speech remains a challenge. Empirical mode decomposition (EMD), as an effective decomposition method for nonlinear non-stationary signals, has been successfully used to analyze emotional speech signals. However, the mode mixing problem of EMD affects the performance of EMD-based methods for SER. Various improved methods for EMD have been proposed to alleviate the mode mixing problem. These improved methods still suffer from the problems of mode mixing, residual noise, and long computation time, and their main parameters cannot be set adaptively. To overcome these problems, we propose a novel SER framework, named IMEMD-CRNN, based on the combination of an improved version of the masking signal-based EMD (IMEMD) and convolutional recurrent neural network (CRNN). First, IMEMD is proposed to decompose speech. IMEMD is a novel disturbance-assisted EMD method and can determine the parameters of masking signals to the nature of signals. Second, we extract the 43-dimensional time-frequency features that can characterize the emotion from the intrinsic mode functions (IMFs) obtained by IMEMD. Finally, we input these features into a CRNN network to recognize emotions. In the CRNN, 2D convolutional neural networks (CNN) layers are used to capture nonlinear local temporal and frequency information of the emotional speech. Bidirectional gated recurrent units (BiGRU) layers are used to learn the temporal context information further. Experiments on the publicly available TESS dataset and Emo-DB dataset demonstrate the effectiveness of our proposed IMEMD-CRNN framework. The TESS dataset consists of 2,800 utterances containing seven emotions recorded by two native English speakers. The Emo-DB dataset consists of 535 utterances containing seven emotions recorded by ten native German speakers. The proposed IMEMD-CRNN framework achieves a state-of-the-art overall accuracy of 100% for the TESS dataset over seven emotions and 93.54% for the Emo-DB dataset over seven emotions. The IMEMD alleviates the mode mixing and obtains IMFs with less noise and more physical meaning with significantly improved efficiency. Our IMEMD-CRNN framework significantly improves the performance of emotion recognition.

  • Preprint Article
  • Cite Count Icon 1
  • 10.32920/ryerson.14653131
Empirical analysis for non-stationary signal de-noising, de-trending and discrimination applications
  • May 23, 2021
  • Muhammad F Kaleem

This dissertation focuses on the study and development of methods for empirical analysis of non-stationary signals in the context of de-noising, de-trending and discrimination applications. For this purpose, Empirical Mode Decomposition (EMD), which is a relatively new signal decomposition technique, is chosen as the starting point. EMD does not rely on any fixed basis, but instead defines a signal adaptive decomposition methodology. The use of EMD for signal de-noising and de-trending is demonstrated through formulation of a methodology for mental task classification using EEG signals. Furthermore, a methodology for analysis and classification of pathological speech signals is developed, whereby a high classification accuracy through use of meaningful instantaneous features is demonstrated. Following this, a novel modification of EMD, named Empirical Mode Decomposition-Modified Peak Selection (EMD-MPS), is proposed. EMD-MPS allows a time-scale based decomposition of signals, which is not possible using the original EMD algorithm. The EMD-MPS algorithm is defined, and its properties empirically established, thereby validating the expected behaviour of EMD-MPS. Importantly, EMD-MPS is shown to provide new insight into the decomposition behaviour of the original EMD algorithm. Also, a novel hierarchical decomposition methodology, which uses the time-scale based decomposition of EMD-MPS to divide a signal into selected frequency bands, is developed and illustrated using synthetic and real world signals. EMD-MPS is also used for time-scale based de-noising and de-trending of signals, first demonstrated using synthetic and real signals, and then validated by practical applications such as mental task classification and seizure detection. An empirical sparse dictionary learning framework based on EMD with application to signal classification is then proposed and developed in the dissertation. As part of this framework, a discriminative dictionary learning algorithm is developed, and characteristics of the empirical dictionary established. The utility of the proposed framework for signal classification is demonstrated using EEG signals. The proposed framework is then applied for automated seizure detection using long-term EEG recordings, and the results are used to discuss the potential and implications for patient specific dictionaries, as well as the associated advantages of the framework when using long-term data.

  • Preprint Article
  • 10.32920/ryerson.14653131.v1
Empirical analysis for non-stationary signal de-noising, de-trending and discrimination applications
  • May 23, 2021
  • Muhammad F Kaleem

This dissertation focuses on the study and development of methods for empirical analysis of non-stationary signals in the context of de-noising, de-trending and discrimination applications. For this purpose, Empirical Mode Decomposition (EMD), which is a relatively new signal decomposition technique, is chosen as the starting point. EMD does not rely on any fixed basis, but instead defines a signal adaptive decomposition methodology. The use of EMD for signal de-noising and de-trending is demonstrated through formulation of a methodology for mental task classification using EEG signals. Furthermore, a methodology for analysis and classification of pathological speech signals is developed, whereby a high classification accuracy through use of meaningful instantaneous features is demonstrated. Following this, a novel modification of EMD, named Empirical Mode Decomposition-Modified Peak Selection (EMD-MPS), is proposed. EMD-MPS allows a time-scale based decomposition of signals, which is not possible using the original EMD algorithm. The EMD-MPS algorithm is defined, and its properties empirically established, thereby validating the expected behaviour of EMD-MPS. Importantly, EMD-MPS is shown to provide new insight into the decomposition behaviour of the original EMD algorithm. Also, a novel hierarchical decomposition methodology, which uses the time-scale based decomposition of EMD-MPS to divide a signal into selected frequency bands, is developed and illustrated using synthetic and real world signals. EMD-MPS is also used for time-scale based de-noising and de-trending of signals, first demonstrated using synthetic and real signals, and then validated by practical applications such as mental task classification and seizure detection. An empirical sparse dictionary learning framework based on EMD with application to signal classification is then proposed and developed in the dissertation. As part of this framework, a discriminative dictionary learning algorithm is developed, and characteristics of the empirical dictionary established. The utility of the proposed framework for signal classification is demonstrated using EEG signals. The proposed framework is then applied for automated seizure detection using long-term EEG recordings, and the results are used to discuss the potential and implications for patient specific dictionaries, as well as the associated advantages of the framework when using long-term data.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.prime.2023.100267
Improving speech communication in the age of face masks: A study on EMD denoising method by subjective speech comparison
  • Sep 1, 2023
  • e-Prime - Advances in Electrical Engineering, Electronics and Energy
  • Marxim Rahula Bharathi B + 4 more

Improving speech communication in the age of face masks: A study on EMD denoising method by subjective speech comparison

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant