Abstract
The paper presents an adaptive system for Voiced/Unvoiced (V/UV) speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection according to the output of a background Noise Classifier (NC) and a Signal-to-Noise Ratio Estimation (SNRE) system. The system was implemented, and the tests performed using the TIMIT speech corpus and its phonetic classification. The results were compared with a nonadaptive classification system and the V/UV detectors adopted by two important speech coding standards: the V/UV detection system in the ETSI ES 202 212 v1.1.2 and the speech classification in the Selectable Mode Vocoder (SMV) algorithm. In all cases the proposed adaptive V/UV classifier outperforms the traditional solutions giving an improvement of 25% in very noisy environments.
Highlights
The issue of Voicing Detection Algorithms (VDAs) has been one of the topics most analysed in the field of speech processing research during the last three decades [1, 2].The correct Voiced/Unvoiced (V/UV) classification of a sound is essential in several speech processing systems
The results were compared with a nonadaptive classification system and the V/UV detectors adopted by two important speech coding standards: the V/UV detection system in the ETSI ES 202 212 v1.1.2 and the speech classification in the Selectable Mode Vocoder (SMV) algorithm
In all cases the proposed adaptive V/UV classifier outperforms the traditional solutions giving an improvement of 25% in very noisy environments
Summary
The issue of Voicing Detection Algorithms (VDAs) has been one of the topics most analysed in the field of speech processing research during the last three decades [1, 2].The correct Voiced/Unvoiced (V/UV) classification of a sound is essential in several speech processing systems. The issue of Voicing Detection Algorithms (VDAs) has been one of the topics most analysed in the field of speech processing research during the last three decades [1, 2]. In general there are various aspects to be analysed and taken into consideration in developing a voiced/unvoiced detection system: the complexity of the algorithm, the delay introduced (and the duration of the analysis window in which the decision is made), robustness to noise (which is mainly channel and/or background noise), the overall performance of the system, any other phonetic classes to be considered (silence/background noise, mixed sounds, etc.), and the training and testing database used to design and test the algorithm (in particular the duration, the number of different speakers, the number of languages, the types of digitally added noise, the sampling frequency, etc.).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.