Nonspeech7k dataset: Classification and analysis of human non‐speech sound

Muhammad Mamunur Rashid,Chengrui Du,Guiqing Li

doi:10.1049/sil2.12233

Muhammad Mamunur Rashid, Chengrui Du + Show 1 more

Open Access

https://doi.org/10.1049/sil2.12233

Copy DOI

Abstract

AbstractHuman non‐speech sounds occur during expressions in a real‐life environment. Realising a person's incapability to prompt confident expressions by non‐speech sounds may assist in identifying premature disorder in medical applications. A novel dataset named Nonspeech7k is introduced that contains a diverse set of human non‐speech sounds, such as the sounds of breathing, coughing, crying, laughing, screaming, sneezing, and yawning. The authors then conduct a variety of classification experiments with end‐to‐end deep convolutional neural networks (CNN) to show the performance of the dataset. First, a set of typical deep classifiers are used to verify the reliability and validity of Nonspeech7k. Involved CNN models include 1D‐2D deep CNN EnvNet, deep stack CNN M11, deep stack CNN M18, intense residual block CNN ResNet34, modified M11 named M12, and the authors’ baseline model. Among these, M12 achieves the highest accuracy of 79%. Second, to verify the heterogeneity of Nonspeech7k with respect to two typical datasets, FSD50K and VocalSound, the authors design a series of experiments to analyse the classification performance of deep neural network classifier M12 by using FSD50K, FSD50K + Nonspeech7k, VocalSound, VocalSound + Nonspeech7k as training data, respectively. Experimental results show that the classifier trained with existing datasets mixed with Nonspeech7k achieves the highest accuracy improvement of 15.7% compared to that without Nonspeech7k mixed. Nonspeech7k is 100% annotated, completely checked, and free of noise. It is available at https://doi.org/10.5281/zenodo.6967442.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IET Signal Processing	Publication Date: Jun 1, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Nonspeech7k dataset: Classification and analysis of human non‐speech sound

Abstract

Talk to us

Similar Papers

More From: IET Signal Processing

Lead the way for us

Similar Papers

Research on improved convolutional wavelet neural network
Jingwei Liu ... Jiaxin Li
Scientific Reports | VOL. 11
Jingwei Liu, et. al.Jingwei Liu ... Jiaxin Li
09 Sep 2021
Scientific Reports | VOL. 11

DEEP LEARNING FRAMEWORK FOR WOVEN COMPOSITE ANALYSIS
Haotian Feng ... Pavana Prabhakar
-
Haotian Feng, et. al.Haotian Feng ... Pavana Prabhakar
20 Sep 2021
20 Sep 2021

Brain tumor segmentation with deep convolutional symmetric neural network
Hao Chen ... Zhen Qin
Neurocomputing | VOL. 392
Hao Chen, et. al.Hao Chen ... Zhen Qin
24 Apr 2019
Neurocomputing | VOL. 392

Clinically Relevant Vulnerabilities of Deep Machine Learning Systems for Skin Cancer Diagnosis
Xinyi Du-Harpur ... Magnus D Lynch
Journal of Investigative Dermatology | VOL. 141
Xinyi Du-Harpur, et. al.Xinyi Du-Harpur ... Magnus D Lynch
12 Sep 2020
Journal of Investigative Dermatology | VOL. 141

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nonspeech7k dataset: Classification and analysis of human non‐speech sound

Abstract

Talk to us

Similar Papers

More From: IET Signal Processing