Robust Voice Activity Detection Algorithm Research Articles

Deep learning has revolutionized voice activity detection (VAD) by offering promising solutions. However, directly applying traditional features, such as raw waveforms and Mel-frequency cepstral coefficients, to deep neural networks often leads to degraded VAD performance due to noise interference. In contrast, humans possess the remarkable ability to discern speech in complex and noisy environments, which motivated us to draw inspiration from the human auditory system. We propose a robust VAD algorithm called auditory-inspired masked modulation encoder based convolutional attention network (AMME-CANet) that integrates our AMME with CANet. Firstly, we investigate the design of auditory-inspired modulation features as a deep-learning encoder (AME), effectively simulating the process of sound-signal transmission to inner ear hair cells and subsequent modulation filtering by neural cells. Secondly, building upon the observed masking effects in the human auditory system, we enhance our auditory-inspired modulation encoder by incorporating a masking mechanism resulting in the AMME. The AMME amplifies cleaner speech frequencies while suppressing noise components. Thirdly, inspired by the human auditory mechanism and capitalizing on contextual information, we leverage the attention mechanism for VAD. This methodology uses an attention mechanism to assign higher weights to contextual information containing richer and more informative cues. Through extensive experimentation and evaluation, we demonstrated the superior performance of AMME-CANet in enhancing VAD under challenging noise conditions.

Read full abstract

Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

Read full abstract

Robust Voice Activity Detection Algorithm Research Articles

Articles published on Robust Voice Activity Detection Algorithm

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

Voice Activity Detection Using Generalized Exponential Kernels for Time and Frequency Domains

Robust Voice Activity Detection Algorithm based on Long Term Dominant Frequency and Spectral Flatness Measure

Formant-Based Robust Voice Activity Detection

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

A unified approach to speech enhancement and voice activity detection

Robust voice activity detection based on harmonic to noise ratio

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Robust voice activity detection algorithm for estimatingnoise spectrum

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Robust Voice Activity Detection Algorithm Research Articles

Articles published on Robust Voice Activity Detection Algorithm

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

Voice Activity Detection Using Generalized Exponential Kernels for Time and Frequency Domains

Robust Voice Activity Detection Algorithm based on Long Term Dominant Frequency and Spectral Flatness Measure

Formant-Based Robust Voice Activity Detection

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

A unified approach to speech enhancement and voice activity detection

Robust voice activity detection based on harmonic to noise ratio

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Robust voice activity detection algorithm for estimatingnoise spectrum

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices