Automatic speaker and age identification of children from raw speech using sincNet over ERB scale

Kodali Radha,Mohan Bansal,Ram Bilas Pachori

doi:10.1016/j.specom.2024.103069

Abstract

This paper presents the newly developed non-native children’s English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-level speech representations, can be fed directly with raw speech signals instead of using traditional hand-crafted features. Moreover, the filters that were learned using standard CNNs appeared to be noisy because they consider all elements of each filter. In contrast, sincNet can be able to generate more meaningful filters simply by replacing the first convolutional layer by a sinc-layer in standard CNNs. The low and high cutoff frequencies of the rectangular band-pass filter are the only parameters that can be learned in sincNet, which has the potential to extract significant speech cues from the speaker, such as pitch and formants. In this work, the sincNet model is significantly changed by switching from baseline Mel scale initializations to equivalent rectangular bandwidth (ERB) initializations, which has the added benefit of allocating additional filters in the lower region of the spectrum. Additionally, it needs to be highlighted that the novel sincNet model is well suited to identify the age of the children. The investigations on both read and spontaneous speech tasks in speaker identification, gender independent & dependent age-group identification of children outperform the baseline models with varying relative improvements in terms of accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic speaker and age identification of children from raw speech using sincNet over ERB scale

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Apr 1, 2024
Citations: 4

Similar Papers

PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform
Wencheng Li ... Jingyu Ning
-
Wencheng Li, et. al.Wencheng Li ... Jingyu Ning
01 Jan 2021
01 Jan 2021

Speaker Recognition from Raw Waveform with SincNet
Mirco Ravanelli ... Yoshua Bengio
-
Mirco Ravanelli, et. al.Mirco Ravanelli ... Yoshua Bengio
01 Dec 2018
01 Dec 2018

Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications
Amira Shafik ... Abdullah M Iliyasu
Applied Acoustics | VOL. 177
Amira Shafik, et. al.Amira Shafik ... Abdullah M Iliyasu
26 Jan 2021
Applied Acoustics | VOL. 177

Comparison of Different Wavelet Packet Trees for Effective Speaker Recognition
G Renisha ... T Jayasree
-
G Renisha, et. al.G Renisha ... T Jayasree
01 Apr 2019
01 Apr 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic speaker and age identification of children from raw speech using sincNet over ERB scale

Abstract

Talk to us

Similar Papers

More From: Speech Communication