Voice Activity Detection Optimized by Adaptive Attention Span Transformer

Wenpeng Mu,Bingshan Liu

doi:10.1109/access.2023.3262518

Abstract

Voice Activity Detection (VAD) is a widely used technique for separating vocal regions from audio signals, with applications in voice language coding, noise reduction, and other domains. While various strategies have been proposed to improve VAD performance, such as ACAM, DCU-10, and Tr-VAD, these approaches often suffer from common limitations, including being unsuitable for long audio and being time-consuming. To address these issues, we propose a new method called AAT-VAD, which integrates an adaptive width attention learning mechanism into the classic transformer framework. Our approach involves extracting Mel-scale Frequency Cepstral Coefficients (MFCC) from the Mel scale frequency domain, adding a masking function to each transformer attention head, and inputting the features processed by the transformer encoder layer into the classifier. Experimental results indicate that our method achieves a 12.8% higher F1-score than DCU-10 and a 0.6% higher F1-score than Tr-VAD under different noise interferences. Furthermore, the average detection cost function (DCF) value of our method is only 14.3% of DCU-10 and 92.4% of Tr-VAD, and the test time of AAT-VAD is only 37.4% of that of Tr-VAD for the same noisy vocal mixed audio.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 9	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Voice Activity Detection Optimized by Adaptive Attention Span Transformer

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Voice Activity Detection Through Adversarial Learning
Supritha M Shetty ... Ujwala Patil
-
Supritha M Shetty, et. al.Supritha M Shetty ... Ujwala Patil
24 Mar 2022
24 Mar 2022

Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection
Abhay Prasad ... Shrikanth S Narayanan
-
Abhay Prasad, et. al.Abhay Prasad ... Shrikanth S Narayanan
14 Sep 2014
14 Sep 2014

A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach
Kah Phooi Seng ... Chien Shing Ooi
IEEE Transactions on Affective Computing | VOL. 9
Kah Phooi Seng, et. al.Kah Phooi Seng ... Chien Shing Ooi
01 Jan 2018
IEEE Transactions on Affective Computing | VOL. 9

A Novel Voice Activity Detection for Multi-Channel Noise Reduction
Ramazan Colak ... Rafet Akdeniz
IEEE Access | VOL. 9
Ramazan Colak, et. al.Ramazan Colak ... Rafet Akdeniz
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Voice Activity Detection Optimized by Adaptive Attention Span Transformer

Abstract

Talk to us

Similar Papers

More From: IEEE Access