Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Zulfiqar Ali,Muhammad Talha

doi:10.1109/access.2018.2805845

Zulfiqar Ali, Muhammad Talha

Open Access

https://doi.org/10.1109/access.2018.2805845

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2018
Citations: 56	License type: cc-by-nc-nd

Affiliation: King Saud University

Abstract

An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Noise robust speech recognition using parallel model compensation and voice activity detection methods
Serhat Hizlisoy ... Zekeriya Tufekci
-
Serhat Hizlisoy, et. al.Serhat Hizlisoy ... Zekeriya Tufekci
01 Dec 2016
01 Dec 2016

Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network
Ozkan Arslan ... Erkan Zeki Engin
Electrica | VOL. 19
Ozkan Arslan, et. al.Ozkan Arslan ... Erkan Zeki Engin
08 Jul 2019
Electrica | VOL. 19

Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection
Masakiyo Fujimoto ... Tomohiro Nakatani
Speech Communication | VOL. 54
Masakiyo Fujimoto, et. al.Masakiyo Fujimoto ... Tomohiro Nakatani
16 Sep 2011
Speech Communication | VOL. 54

Comparison of acoustic and visual voice activity detection for noisy speech recognition
Piotr Bratoszewski ... Grzegorz Szwoch
-
Piotr Bratoszewski, et. al.Piotr Bratoszewski ... Grzegorz Szwoch
01 Sep 2016
01 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Abstract

Talk to us

Similar Papers

More From: IEEE Access