Glottal Closure Regions Research Articles

AbstractThis work presents an automatic tonal/nontonal preclassification‐based Indian language identification (LID) system. Languages are firstly classified into tonal and nontonal categories, and then, individual languages are identified from the languages of the respective categories. This work proposes the use of pitch Chroma and formant features for this task, and also investigates how Mel‐frequency Cepstral Coefficients (MFCCs) complement these features. It further explores block processing (BP), pitch synchronous analysis (PSA)‐ and glottal closure regions (GCRs)‐based approaches for feature extraction, using syllables as basic units. Cascade convolutional neural network (CNN)‐long short‐term memory (LSTM) model using syllable‐level features has been developed. National Institute of Technology Silchar language database (NITS‐LD) and OGI‐Multilingual Telephone Speech Corpus (OGI‐MLTS) have been used for experimental validation. The proposed system based on the score combination of Cascade CNN‐LSTM models of Chroma (extracted from BP method), first two formants and MFCCs (both extracted from GCR method) reports the highest accuracies. In the preclassification stage, the observed accuracies are 91%, 87.3%, and 85.1% for NITS‐LD, for 30 s, 10 s, and 3 s test data respectively. For OGI‐MLTS database, the respective accuracies are 86.7%, 83.1%, and 80.6%. That amounts to absolute improvements of 11.6%, 12.3%, and 13.9% for NITS‐LD, and 12.5%, 11.9%, and 12.6% for OGI‐MLTS database with respect to that of the baseline system. The proposed preclassification‐based LID system shows improvements of 7.3%, 6.4%, and 7.4% for NITS‐LD and 6.1%, 6.7%, and 7.2% for OGI‐MLTS database over the baseline system for the three respective test data conditions.

Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments.

Glottal Closure Regions Research Articles

Related Topics

Articles published on Glottal Closure Regions

Cascade convolutional neural network‐long short‐term memory recurrent neural networks for automatic tonal and nontonal preclassification‐based Indian language identification

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Identification of Indian languages using multi-level spectral and prosodic features

Pitch synchronous and glottal closure based speech analysis for language recognition

Detection of Vowel Offset Point From Speech Signal

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Vowel Onset Point Detection for Low Bit Rate Coded Speech

Emotion recognition from speech using source, system, and prosodic features

Improved consonant–vowel recognition for low bit‐rate coded speech

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Glottal Closure Regions Research Articles

Related Topics

Articles published on Glottal Closure Regions

Cascade convolutional neural network‐long short‐term memory recurrent neural networks for automatic tonal and nontonal preclassification‐based Indian language identification

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Identification of Indian languages using multi-level spectral and prosodic features

Pitch synchronous and glottal closure based speech analysis for language recognition

Detection of Vowel Offset Point From Speech Signal

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Vowel Onset Point Detection for Low Bit Rate Coded Speech

Emotion recognition from speech using source, system, and prosodic features

Improved consonant–vowel recognition for low bit‐rate coded speech