Voice Activity Detection Research Articles

Anatomy-based fitting (ABF), a relatively new technique for cochlear implant programming, attempts to lessen the impact of the electrode insertion location-related frequency-to-place mismatch (FPM). This study aimed to compare vowels and consonant perception in quiet and in noise among experienced adult cochlear implant (CI) users using the ABF and the regular, conventional-based fitting (CBF) map (pre-ABF) over six months. Nine ears from eight experienced adult CI users were included in the experimental and longitudinal research. Using surgical planning software called Otoplan, post-operative computed CT scans were used to determine the locations of intracochlear electrodes and their angle of insertion. The anatomy-based frequency bands were produced by Maestro 9.0 CI fitting software using the Otoplan data. Nonsense syllables with consonant-vowel-consonant (CVC) recognition scores in quiet and noise (+5dB SNR) were compared at baseline, three, and six months after ABF. The vowels involved were /a, i, u/, while the consonants were voiced /b, d, g/ and voiceless /p, t, k/ plosives. Speech pieces were presented at 30 dB SL in a sound-treated room through a loudspeaker positioned at 0° azimuth. On average, the ABF maps shifted center frequency ranging from 0.46 semitones (0.04 octave) at (E12) to 23.94 semitones (1.99 octave) at (E1) as compared to the CBF maps. The mean vowel and consonant identification scores in quiet and in noise were significantly higher in ABF than in CBF (p<0.05) with a large effect size and the trend of improvement was seen with time. Voiced consonants had better scores than the voiceless consonants. The results demonstrated improved perception of vowels and consonants, particularly for sounds containing voicing cues after using the ABF maps. The results also suggested that ABF could be more effective for voice detection in noise. Overall, the findings indicate that correcting place mismatch with an ABF map may improve speech perception, at least among experienced adult cochlear implant users.

Read full abstract

Artificial intelligence (AI) empowered edge computing has given rise to a new paradigm and effectively facilitated the promotion and development of multimedia applications. The speech assistant is one of the significant services provided by multimedia applications, which aims to offer intelligent interactive experiences between humans and machines. However, malicious attackers may exploit spoofed speeches to deceive speech assistants, posing great challenges to the security of multimedia applications. The limited resources of multimedia terminal devices hinder their ability to effectively load speech spoofing detection models. Furthermore, processing and analyzing speech in the cloud can result in poor real-time performance and potential privacy risks. Existing speech spoofing detection methods rely heavily on annotated data and exhibit poor generalization capabilities for unseen spoofed speeches. To address these challenges, this paper first proposes the Coordinate Attention Network (CA2Net) that consists of coordinate attention blocks and Res2Net blocks. CA2Net can simultaneously extract temporal and spectral speech feature information and represent multi-scale speech features at a granularity level. Besides, a contrastive learning-based speech spoofing detection framework named GEMINI is proposed. GEMINI can be effectively deployed on edge nodes and autonomously learn speech features with strong generalization capabilities. GEMINI first performs data augmentation on speech signals and extracts conventional acoustic features to enhance the feature robustness. Subsequently, GEMINI utilizes the proposed CA2Net to further explore the discriminative speech features. Then, a tensor-based multi-attention comparison model is employed to maximize the consistency between speech contexts. GEMINI continuously updates CA2Net with contrastive learning, which enables CA2Net to effectively represent speech signals and accurately detect spoofed speeches. Extensive experiments on the ASVspoof2019 dataset show that GEMINI reduces the Equal Error Rate and tandem Detection Cost Function by up to 96.75% and 96.35% in the physical access scenario, and by up to 86.62% and 87.71% in the logical access scenario compared to peer methods.

Read full abstract

Voice Activity Detection Research Articles

Related Topics

Articles published on Voice Activity Detection

An Ensemble Approach for Speaker Identification from Audio Files in Noisy Environments

Speech Perception Outcomes with the Anatomy-Based Fitting Map Among Experienced, Adult Cochlear Implant Users: A Longitudinal Study.

Exploiting speech tremors: machine learning for early diagnosis of amyotrophic lateral sclerosis

Multi-scale Information Aggregation for Spoofing Detection

AFP-Conformer: Asymptotic Feature Pyramid Conformer for Spoofing Speech Detection

Deep learning for hate speech detection: a comparative study

Research on multimodal hate speech detection based on self-attention mechanism feature fusion

Emotion Classification from Speech Waveform Using Machine Learning and Deep Learning Techniques

GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding

A Comparative Study of Transformer-based Models for Hate-Speech Detection in English-Kiswahili Code-Switched Social Media Text

Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence

Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

Integrated noise suppression techniques for enhancing voice activity detection in degraded environments

Dual-stream Noise and Speech Information Perception based Speech Enhancement

Spoofing countermeasure for fake speech detection using brute force features

Understanding hate speech: the HateInsights dataset and model interpretability

Smart Dispenser Using Voice Recognition as an Assistive Device for the Visually Impaired with a Second-Order IIR Filter Algorithm

Exploring a Mobile Technology-Driven Model for Intercultural Communication Education

Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.

TABHATE: A Target-based hate speech detection dataset in Hindi

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voice Activity Detection Research Articles

Related Topics

Articles published on Voice Activity Detection

An Ensemble Approach for Speaker Identification from Audio Files in Noisy Environments

Speech Perception Outcomes with the Anatomy-Based Fitting Map Among Experienced, Adult Cochlear Implant Users: A Longitudinal Study.

Exploiting speech tremors: machine learning for early diagnosis of amyotrophic lateral sclerosis

Multi-scale Information Aggregation for Spoofing Detection

AFP-Conformer: Asymptotic Feature Pyramid Conformer for Spoofing Speech Detection

Deep learning for hate speech detection: a comparative study

Research on multimodal hate speech detection based on self-attention mechanism feature fusion

Emotion Classification from Speech Waveform Using Machine Learning and Deep Learning Techniques

GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding

A Comparative Study of Transformer-based Models for Hate-Speech Detection in English-Kiswahili Code-Switched Social Media Text

Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence

Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

Integrated noise suppression techniques for enhancing voice activity detection in degraded environments

Dual-stream Noise and Speech Information Perception based Speech Enhancement

Spoofing countermeasure for fake speech detection using brute force features

Understanding hate speech: the HateInsights dataset and model interpretability

Smart Dispenser Using Voice Recognition as an Assistive Device for the Visually Impaired with a Second-Order IIR Filter Algorithm

Exploring a Mobile Technology-Driven Model for Intercultural Communication Education

Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.

TABHATE: A Target-based hate speech detection dataset in Hindi