Deep learning for minimum mean-square error approaches to speech enhancement

  • Abstract
  • Literature Map
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Deep learning for minimum mean-square error approaches to speech enhancement

ReferencesShowing 10 of 35 papers
  • Cite Count Icon 70
  • 10.1109/icassp.2016.7472934
DNN-based enhancement of noisy and reverberant speech
  • Mar 1, 2016
  • Yan Zhao + 3 more

  • Cite Count Icon 85568
  • 10.1162/neco.1997.9.8.1735
Long short-term memory.
  • Nov 1, 1997
  • Neural computation
  • Sepp Hochreiter + 1 more

  • Cite Count Icon 27
  • 10.1109/embc.2016.7590807
Automatic switching between noise classification and speech enhancement for hearing aid devices.
  • Aug 1, 2016
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Fatemeh Saki + 1 more

  • Cite Count Icon 272
  • 10.1109/tassp.1980.1163353
A weighted overlap-add method of short-time Fourier analysis/Synthesis
  • Feb 1, 1980
  • IEEE Transactions on Acoustics, Speech, and Signal Processing
  • R Crochiere

  • Cite Count Icon 1284
  • 10.1109/taslp.2014.2364452
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
  • Jan 1, 2015
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • Yong Xu + 3 more

  • Cite Count Icon 3385
  • 10.1109/tassp.1985.1164550
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
  • Apr 1, 1985
  • IEEE Transactions on Acoustics, Speech, and Signal Processing
  • Y Ephraim + 1 more

  • Cite Count Icon 707
  • 10.1109/5.237532
Signal modeling techniques in speech recognition
  • Jan 1, 1993
  • Proceedings of the IEEE
  • J.W Picone

  • Cite Count Icon 74
  • 10.1109/icassp.2018.8462593
Perceptually Guided Speech Enhancement Using Deep Neural Networks
  • Apr 1, 2018
  • Yan Zhao + 3 more

  • Cite Count Icon 60
  • 10.1016/j.specom.2011.02.001
Modulation-domain Kalman filtering for single-channel speech enhancement
  • Feb 16, 2011
  • Speech Communication
  • Stephen So + 1 more

  • Cite Count Icon 4508
  • 10.1109/icassp.2015.7178964
Librispeech: An ASR corpus based on public domain audio books
  • Apr 1, 2015
  • Vassil Panayotov + 3 more

CitationsShowing 10 of 113 papers
  • Conference Article
  • 10.1109/icassp43922.2022.9746267
A Priori SNR Estimation for Speech Enhancement Based on PESQ-Induced Reinforcement Learning
  • May 23, 2022
  • Tong Lei + 3 more

Perceptual evaluation of speech quality (PESQ) is widely accepted as an effective objective metric closely related to the speech quality sensed by human listening perception. Due to its evaluation complexity and non-differentiability, PESQ is difficult to include in the cost function for deep learning-based speech enhancement. In this paper, we focus on introducing PESQ to improve Deep Xi, a recently proposed minimum mean square error (MMSE) based speech enhancement with a priori signal-to-ratio (SNR) estimated by a deep neural network. Regarding discrete a priori SNR as actions, we apply reinforcement learning (RL) to select the optimal SNR at the frame level through the reward function associated with PESQ. The experimental results show that the RL-trained network is able to achieve a better PESQ score, especially in low SNR conditions.

  • Conference Article
  • 10.1109/3ict64318.2024.10824570
Gender-Specific Speech Enhancement Architecture for Improving Deep Neural Networks Learning
  • Nov 17, 2024
  • Soha A Nossier + 1 more

Gender-Specific Speech Enhancement Architecture for Improving Deep Neural Networks Learning

  • Conference Article
  • Cite Count Icon 11
  • 10.21437/interspeech.2020-1551
A Deep Learning-Based Kalman Filter for Speech Enhancement
  • Oct 25, 2020
  • Sujan Kumar Roy + 2 more

The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear prediction coefficients (LPCs) in real-world noise conditions. This results in a degraded speech enhancement performance. In this paper, a deep learning approach is used to more accurately estimate the noise variance and LPCs, enabling the KF to enhance speech in various noise conditions. Specifically, a deep learning approach to MMSE-based noise power spectral density (PSD) estimation, called DeepMMSE, is used. The estimated noise PSD is used to compute the noise variance. We also construct a whitening filter with its coefficients computed from the estimated noise PSD. It is then applied to the noisy speech, yielding pre-whitened speech for computing the LPCs. The improved noise variance and LPC estimates enable the KF to minimise the residual noise and distortion in the enhanced speech. Experimental results show that the proposed method exhibits higher quality and intelligibility in the enhanced speech than the benchmark methods in various noise conditions for a wide-range of SNR levels.

  • Open Access Icon
  • Conference Article
  • Cite Count Icon 7
  • 10.1109/csde50874.2020.9411566
Deep Xi as a Front-End for Robust Automatic Speech Recognition
  • Dec 16, 2020
  • Aaron Nicolson + 1 more

Currently, deep learning approaches to speech enhancement are most commonly used as front-ends for robust automatic speech recognition (ASR). A recently proposed deep learning approach to a priori SNR estimation, called Deep Xi, was able to produce enhanced speech at a higher quality and intelligibility than recent deep learning approaches to speech enhancement. Motivated by this, we investigate Deep Xi as a front-end for robust ASR. Deep Xi is evaluated using real-world non-stationary and coloured noise sources at multiple SNR levels. Our experimental investigation shows that Deep Xi as a frontend is able to produce a lower word error rate than recent deep learning approaches to speech enhancement. The results presented in this work show that Deep Xi is a viable front-end, and is able to significantly increase the robustness of an ASR system.Availability: Deep Xi is available at https://github.com/anicolson/DeepXi.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s11042-022-12632-6
Speech enhancement using U-nets with wide-context units
  • Mar 9, 2022
  • Multimedia Tools and Applications
  • Tomasz Grzywalski + 1 more

Speech enhancement using U-nets with wide-context units

  • Open Access Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1186/s13634-021-00813-8
Speech enhancement from fused features based on deep neural network and gated recurrent unit network
  • Oct 24, 2021
  • EURASIP Journal on Advances in Signal Processing
  • Youming Wang + 3 more

Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.ecoinf.2024.102514
Assessing water quality of an ecologically critical urban canal incorporating machine learning approaches
  • Feb 13, 2024
  • Ecological Informatics
  • Abdul Majed Sajib + 6 more

This study assessed water quality (WQ) in Tongi Canal, an ecologically critical and economically important urban canal in Bangladesh. The researchers employed the Root Mean Square Water Quality Index (RMS-WQI) model, utilizing seven WQ indicators, including temperature, dissolve oxygen, electrical conductivity, lead, cadmium, and iron to calculate the water quality index (WQI) score. The results showed that most of the water sampling locations showed poor WQ, with many indicators violating Bangladesh's environmental conservation regulations. This study employed eight machine learning algorithms, where the Gaussian process regression (GPR) model demonstrated superior performance (training RMSE = 1.77, testing RMSE = 0.0006) in predicting WQI scores. To validate the GPR model's performance, several performance measures, including the coefficient of determination (R2), the Nash-Sutcliffe efficiency (NSE), the model efficiency factor (MEF), Z statistics, and Taylor diagram analysis, were employed. The GPR model exhibited higher sensitivity (R2 = 1.0) and efficiency (NSE = 1.0, MEF = 0.0) in predicting WQ. The analysis of model uncertainty (standard uncertainty = 7.08 ± 0.9025; expanded uncertainty = 7.08 ± 1.846) indicates that the RMS-WQI model holds potential for assessing the WQ of inland waterbodies. These findings indicate that the RMS-WQI model could be an effective approach for assessing inland waters across Bangladesh. The study's results showed that most of the WQ indicators did not meet the recommended guidelines, indicating that the water in the Tongi Canal is unsafe and unsuitable for various purposes. The study's implications extend beyond the Tongi Canal and could contribute to WQ management initiatives across Bangladesh.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1109/access.2021.3075209
DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement
  • Jan 1, 2021
  • IEEE Access
  • Sujan Kumar Roy + 2 more

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by 10 listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1121/10.0004823
On training targets for deep learning approaches to clean speech magnitude spectrum estimation.
  • May 1, 2021
  • The Journal of the Acoustical Society of America
  • Aaron Nicolson + 1 more

Estimation of the clean speech short-time magnitude spectrum (MS) is key for speech enhancement and separation. Moreover, an automatic speech recognition (ASR) system that employs a front-end relies on clean speech MS estimation to remain robust. Training targets for deep learning approaches to clean speech MS estimation fall into three categories: computational auditory scene analysis (CASA), MS, and minimum mean square error (MMSE) estimator training targets. The choice of the training target can have a significant impact on speech enhancement/separation and robust ASR performance. Motivated by this, the training target that produces enhanced/separated speech at the highest quality and intelligibility and that which is best for an ASR front-end is found. Three different deep neural network (DNN) types and two datasets, which include real-world nonstationary and coloured noise sources at multiple signal-to-noise ratio (SNR) levels, were used for evaluation. Ten objective measures were employed, including the word error rate of the Deep Speech ASR system. It is found that training targets that estimate the a priori SNR for MMSE estimators produce the highest objective quality scores. Moreover, it is established that the gain of MMSE estimators and the ideal amplitude mask produce the highest objective intelligibility scores and are most suitable for an ASR front-end.

  • Book Chapter
  • 10.1007/978-981-99-4742-3_28
Improving the Accuracy of Deep Learning Modelling Based on Statistical Calculation of Mathematical Equations
  • Jan 1, 2023
  • Feng Li + 1 more

Improving the Accuracy of Deep Learning Modelling Based on Statistical Calculation of Mathematical Equations

Similar Papers
  • Research Article
  • Cite Count Icon 5
  • 10.1121/10.0002113
Spectral distortion level resulting in a just-noticeable difference between an a priori signal-to-noise ratio estimate and its instantaneous case.
  • Oct 1, 2020
  • The Journal of the Acoustical Society of America
  • Aaron Nicolson + 1 more

Minimum mean-square error (MMSE) approaches to speech enhancement are widely used in the literature. The quality of enhanced speech produced by an MMSE approach is directly impacted by the accuracy of the employed a priori signal-to-noise ratio (SNR) estimator. In this paper, the a priori SNR estimate spectral distortion (SD) level that results in a just-noticeable difference (JND) in the perceived quality of MMSE approach enhanced speech is found. The JND SD level is indicative of the accuracy that an a priori SNR estimator must exceed to have no impact on the perceived quality of MMSE approach enhanced speech. To measure the JND SD level, listening tests are conducted across five SNR levels, five noise sources, and two MMSE approaches [the MMSE short-time spectral amplitude (MMSE-STSA) estimator and the Wiener filter]. A statistical analysis of the results indicates that the JND SD level increases with the SNR level, is higher for the MMSE-STSA estimator, and is not impacted by the type of background noise. Following the literature, a significant improvement in a priori SNR estimation accuracy is required to reach the JND SD level.

  • Dissertation
  • 10.25904/1912/4020
Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing
  • Dec 4, 2020
  • Aaron Nicolson

Speech corrupted by background noise (or noisy speech) can cause misinterpretation and fatigue during phone and conference calls, and for hearing aid users. Noisy speech can also severely impact the performance of speech processing systems such as automatic speech recognition (ASR), automatic speaker verification (ASV), and automatic speaker identification (ASI) systems. Currently, deep learning approaches are employed in an end-to-end fashion to improve robustness. The target speech (or clean speech) is used as the training target or large noisy speech datasets are used to facilitate multi-condition training. In this dissertation, we propose competitive alternatives to the preceding approaches by updating two classic robust speech processing techniques using deep learning. The two techniques include minimum mean-square error (MMSE) and missing data approaches. An MMSE estimator aims to improve the perceived quality and intelligibility of noisy speech. This is accomplished by suppressing any background noise without distorting the speech. Prior to the introduction of deep learning, MMSE estimators were the standard speech enhancement approach. MMSE estimators require the accurate estimation of the a priori signal-to-noise ratio (SNR) to attain a high level of speech enhancement performance. However, current methods produce a priori SNR estimates with a large tracking delay and a considerable amount of bias. Hence, we propose a deep learning approach to a priori SNR estimation that is significantly more accurate than previous estimators, called Deep Xi. Through objective and subjective testing across multiple conditions, such as real-world non-stationary and coloured noise sources at multiple SNR levels, we show that Deep Xi allows MMSE estimators to produce the highest quality enhanced speech amongst all clean speech magnitude spectrum estimators. Missing data approaches improve robustness by performing inference only on noisy speech features that reliably represent clean speech. In particular, the marginalisation method was able to significantly increase the robustness of Gaussian mixture model (GMM)-based speech classification systems (e.g. GMM-based ASR, ASV, or ASI systems) in the early 2000s. However, deep neural networks (DNNs) used in current speech classification systems are non-probabilistic, a requirement for marginalisation. Hence, multi-condition training or noisy speech pre-processing is used to increase the robustness of DNN-based speech classification systems. Recently, sum-product networks (SPNs) were proposed, which are deep probabilistic graphical models that can perform the probabilistic queries required for missing data approaches. While available toolkits for SPNs are in their infancy, we show through an ASI task that SPNs using missing data approaches could be a strong alternative for robust speech processing in the future. This dissertation demonstrates that MMSE estimators and missing data approaches are still relevant approaches to robust speech processing when assisted by deep learning.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/acssc.1995.540554
A blind adaptive interference cancellation scheme for CDMA systems
  • Oct 30, 1995
  • J.B Schodorf + 1 more

An interference cancellation scheme for CDMA communication systems is presented. The algorithm is based on constrained optimization where receiver output power is minimized subject to the constraint that a desired code be passed with no distortion. Output power minimization (OPM) schemes have been shown to be equivalent to minimum mean square error (MMSE) approaches. Like MMSE approaches, OPM schemes do not require explicit knowledge of the interference structure. Unlike most MMSE approaches, however, OPM schemes do not require training. The algorithms are blind in the sense that only knowledge of the desired user's code (not the actual transmitted bit sequence) and associated timing is necessary. Moreover, the algorithms are linear and, therefore, not susceptible to misconvergence. The particular OPM implementation presented is based on the generalized sidelobe canceller structure which was originally developed for adaptive cancellation of spatial interference in array signal processing applications. Algorithm performance results are presented in the form of signal to interference ratios and bit error rate curves.

  • Research Article
  • Cite Count Icon 48
  • 10.1016/j.apacoust.2020.107647
LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement
  • Sep 11, 2020
  • Applied Acoustics
  • Zeyu Wang + 3 more

LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icspcc52875.2021.9564668
An NMF-based MMSE Approach for Single Channel Speech Enhancement Using Densely Connected Convolutional Network
  • Aug 17, 2021
  • Xinyu Li + 2 more

Presently, because of the development of deep learning technology, there has been increasingly more attention on state-of-the-art masking and mapping based speech enhancement methods. However, traditional speech enhancement approaches, like minimum mean-square error (MMSE) and wiener filter (WF) have not been fully investigated. In order to the better characterize, we proposed a deep learning based MMSE approach for single-channel speech enhancement based on Non-negative Matrix Factorization (NMF). The performance of MMSE approach can be improved by a priori signal-to-noise ratio. Therefore, we utilized an NMF-based Densely Connected Convolutional Network (DenseNet) as an estimator of the a priori signal-to-noise ratio (SNR). In test stage, multiple SNR level speech from colored noise sources and real-world non-stationary noise sources were used for evaluation. As expected, our present study outperformed many previous speech enhancement methods.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.specom.2014.12.002
Speech enhancement based on β-order MMSE estimation of Short Time Spectral Amplitude and Laplacian speech modeling
  • Dec 9, 2014
  • Speech Communication
  • Hamid Reza Abutalebi + 1 more

Speech enhancement based on β-order MMSE estimation of Short Time Spectral Amplitude and Laplacian speech modeling

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.863504
Reconstruction for distributed video coding: a Markov random field approach with context-adaptive smoothness prior
  • Jul 11, 2010
  • Yongsheng Zhang + 3 more

An important issue in Wyner-Ziv video coding is the reconstruction of Wyner-Ziv frames with decoded bit-planes. So far, there are two major approaches: the Maximum a Posteriori (MAP) reconstruction and the Minimum Mean Square Error (MMSE) reconstruction algorithms. However, these approaches do not exploit smoothness constraints in natural images. In this paper, we model a Wyner-Ziv frame by Markov random fields (MRFs), and produce reconstruction results by finding an MAP estimation of the MRF model. In the MRF model, the energy function consists of two terms: a data term, MSE distortion metric in this paper, measuring the statistical correlation between side-information and the source, and a smoothness term enforcing spatial coherence. In order to better describe the spatial constraints of images, we propose a context-adaptive smoothness term by analyzing the correspondence between the output of Slepian-Wolf decoding and successive frames available at decoders. The significance of the smoothness term varies in accordance with the spatial variation within different regions. To some extent, the proposed approach is an extension to the MAP and MMSE approaches by exploiting the intrinsic smoothness characteristic of natural images. Experimental results demonstrate a considerable performance gain compared with the MAP and MMSE approaches.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/wcsp.2013.6677271
Handover based on receive beamforming in high mobility cellular communication networks
  • Oct 1, 2013
  • Cuiling Qi + 2 more

This paper provides a new approach to solve the handover problem in high mobility cellular communication networks. In particular, the mobile node, e.g, the high-speed train(HST), is equipped with multiple antennas by which it finds the target base station (BS) quickly with only receive beamforming. When only imperfect channel state information (CSI) is available, the output signal-to-noise ratio (SNR) of the receive beamformer will be interfered by signals from different nearby BSs, which may lead to possible error in the selection of target BS. To deal with this, two methods, i.e., maximum-ratio combining (MRC) and Minimum Mean Square Error (MMSE), are proposed to obtain the optimal beamforming vectors. Simulation results show that both the proposed methods perform well, and the MMSE approach has good robustness.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/hscma.2014.6843264
Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement
  • May 1, 2014
  • Shunsuke Nakai + 4 more

In this paper, we provide a theoretical analysis of the minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator with the biased a priori SNR estimation and its extension to musical-noise-free speech enhancement. Recently, musical-noise-free speech enhancement has been proposed, where no musical noise is generated in iterative spectral subtraction. However, no existence of the musical-noise-free condition in the MMSE-STSA estimator has been reported. Therefore, in this paper, we show that the musical-noise-free condition exists in the biased MMSE-STSA estimator via the theoretical analysis. In addition, we perform comparative experiments and clarify the efficacy of the proposed musical-noise-free speech enhancement.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/wocn.2013.6616178
A DPC based MMSE beamforming design for a MIMO system with interference
  • Jul 1, 2013
  • Prateek Rathore + 1 more

In this paper, a Dirty Paper Coding (DPC) based Minimum Mean Square Error (MMSE) beamforming design for a MIMO interference channel has been introduced. It includes signal leakage and linear transmit filters. At each transmitter, MMSE approach has been used comprising the signal with the interference. The simulation results show that the proposed design achieves significant reduction in mean square error (MSE) value as compared to leakage based MMSE approach. An optimal DPC strategy has been used for cancelling causal interference.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icassp.2005.1415749
Reduced-Complexity Equalization for MC-CDMA Systems over Time-Varying Channels
  • Mar 18, 2005
  • L Rugini + 2 more

We present a low-complexity equalizer for multicarrier code-division multiple-access (MC-CDMA) downlink systems over time-varying (TV) multipath channels with non-negligible Doppler spread. The equalization algorithm, which is based on a block minimum mean-squared error (MMSE) approach, exploits the band structure of the frequency-domain channel matrix by means of a band LDL/sup H/ factorization. The complexity of the proposed block MMSE equalizer is linear in the number of subcarriers, and smaller with respect to a serial MMSE equalizer characterized by a similar performance.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tsp.2003.810288
MMSE equalization of downlink CDMA channel utilizing unused orthogonal spreading sequences
  • May 1, 2003
  • IEEE Transactions on Signal Processing
  • Jinho Choi

For the code division multiple access (CDMA) downlink channel, the chip-level equalization has been considered in this paper. There is no interference after despreading if all spreading codes are orthogonal, as in IS-95. However, it cannot be true for a frequency-selective fading channel. In this case, the chip-level equalization can be applied to restore the orthogonality. We investigate the chip-level equalization using finite impulse response (FIR) equalizers for the mobile station with multiple receive antennas. A blind approach and the minimum mean square error (MMSE) approach with code-multiplexed pilot are considered. A generalized MMSE equalization, which combines the MMSE and blind approaches together, is also investigated. It is shown that the generalized MMSE equalizer can effectively increase the number of samples to track the variation of channel and thereby performs better when the coherence time is small. In addition, we derive closed-form solutions of the blind, MMSE, and generalized MMSE equalizers for given channels.

  • Research Article
  • Cite Count Icon 2
  • 10.1049/el.2012.1472
Sum-rate maximising in cognitive MIMO ad-hoc networks using weighted MMSE approach
  • Sep 13, 2012
  • Electronics Letters
  • X Gui + 2 more

This reported work focuses on weighted sum-rate maximisation (WSRM) of cognitive radio ad-hoc networks, where a K user multiple-input multiple-output interference network (K-MIMO-IFN) uses the same spectrum with a licenced primary user. Because the WSRM problem in K-MIMO-IFN is non-convex and it is difficult to get an optimal solution directly, the weighted minimum mean square error (MMSE) approach is used to make the problem easier to handle. This report then proposes a dual-MMSE-algorithm, which iteratively finds a local optimal solution and only needs the local channel knowledge. Simulation results show that the proposed algorithm outperforms other conventional algorithms.

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/iscas.1993.393677
Design of optimum interpolation filters for digital demodulators
  • May 3, 1993
  • V Zivojnovic + 1 more

Interpolation filtering in digital demodulators with fixed input sampling rate is treated. The procedures for optimum interpolation filter design which account for the peculiarities of the digital demodulators are presented. Minimum mean square error (MMSE) interpolation filters are discussed, and a design procedure for standard input signals is given. For some hardwired implementations the on-line computation of the filter coefficients can be advantageous. For those cases, the MMSE approach is almost infeasible and polynomial approximations are used. New design algorithms for coefficient computation for optimum polynomial interpolators are given. The design procedure is illustrated by a number of examples, and the results are compared to the nonpolynomial MMSE case. >

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icdsp.2015.7252070
Statistical-model-based speech enhancement with musical-noise-free properties
  • Jul 1, 2015
  • Hiroshi Saruwatari

In this paper, we address theoretical studies on the existence of musical-noise-free conditions for statistical-model-based speech enhancement methods. Recently, musical-noise-free speech enhancement has been proposed, where no musical noise is generated in iterative spectral subtraction, iterative Wiener filtering, and the minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator. As an extension of this theory to more flexible speech enhancement algorithms, in this paper, we reveal that the musical-noise-free condition exists in the methods with the a priori statistical speech models, e.g., the biased generalized MMSE-STSA estimator, via higher-order-statistics analysis. In addition, we perform comparative experiments and clarify the efficacy of the proposed musical-noise-free speech enhancement.

More from: Speech Communication
  • New
  • Research Article
  • 10.1016/j.specom.2025.103314
LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement
  • Nov 1, 2025
  • Speech Communication
  • Junyu Wang + 5 more

  • New
  • Research Article
  • 10.1016/j.specom.2025.103317
Direct speech-to-speech neural machine translation: A survey
  • Nov 1, 2025
  • Speech Communication
  • Mahendra Gupta + 2 more

  • New
  • Research Article
  • 10.1016/j.specom.2025.103328
Categorization of patients affected with neurogenerative dysarthria among Hindi-speaking population and analyzing factors causing reduced speech intelligibility at the human-machine interface
  • Nov 1, 2025
  • Speech Communication
  • Raj Kumar + 3 more

  • New
  • Research Article
  • 10.1016/j.specom.2025.103313
MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition
  • Nov 1, 2025
  • Speech Communication
  • Hongchen Wu + 13 more

  • New
  • Research Article
  • 10.1016/j.specom.2025.103327
FinnAffect: An affective speech corpus for spontaneous finnish
  • Nov 1, 2025
  • Speech Communication
  • Kalle Lahtinen + 2 more

  • New
  • Research Article
  • 10.1016/j.specom.2025.103305
Phonetic reduction is associated with positive assessment and other pragmatic functions
  • Nov 1, 2025
  • Speech Communication
  • Nigel G Ward + 3 more

  • Research Article
  • 10.1016/j.specom.2025.103316
Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks
  • Oct 1, 2025
  • Speech Communication
  • Fatma Gumus + 1 more

  • Research Article
  • 10.1016/j.specom.2025.103323
Noise-Robust Feature Extraction for Keyword Spotting Based on Supervised Adversarial Domain Adaptation Training Strategies
  • Oct 1, 2025
  • Speech Communication
  • Yongqiang Chen + 4 more

  • Research Article
  • 10.1016/j.specom.2025.103319
A survey of deep learning for complex speech spectrograms
  • Oct 1, 2025
  • Speech Communication
  • Yuying Xie + 1 more

  • Research Article
  • 10.1016/j.specom.2025.103315
An acoustic analysis of the nasal electrolarynx in healthy participants
  • Oct 1, 2025
  • Speech Communication
  • Ching-Hung Lai + 8 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon