Abstract

Distinctive phonetic features (DPFs) abstractedly describe the place, manner of articulation, and voicing of the language phonemes. While DPFs are powerful features of speech signals that capture the unique articulatory characteristics of each phoneme, the task of DPF extraction is challenged by the need for efficient computational model. Unlike the ordinary acoustic features that can be directly determined form speech waveform using closed-form expressions, DPF elements are extracted from acoustic features using machine learning (ML) techniques. Therefore, for the objective of developing an acoustic-to-phonetic converter of high accuracy and low complexity, it is important to select the input acoustic features that are simple, yet carry adequate information. This paper examines the effectiveness of using spectrogram as the acoustic feature with DPFs modeled using two deep learning techniques: the deep belief network (DBN) and the convolutional recurrent neural network (CRNN). The proposed method is applied on Modern Standard Arabic (MSA). Multi-label modeling is considered in the proposed acoustic-to-phonetic converter. The learning techniques were evaluated by proper evaluation measures that accommodate the imbalanced nature of DPF elements. The results showed that the CRNN is more accurate in extracting the DPFs than the DBN.

Highlights

  • Distinctive phonetic features (DPFs) are relevant and highly descriptive features of speech waveforms that have the remarkable ability to capture and represent the unique articulatory characteristics of each phoneme [1]. which are relevant and highly descriptive features of speech waveforms [1]

  • This paper proposes two DPF extractors based on the deep belief network (DBN) and convolutional recurrent neural network (CRNN) models, which attempt to find the weak and strong correlations between the considered DPF elements and the acoustic features embedded in Arabic speech and language

  • This study examined the advantages of deep learning in the DPF modeling and extraction of Arabic language phonemes

Read more

Summary

Introduction

Distinctive phonetic features (DPFs) are relevant and highly descriptive features of speech waveforms that have the remarkable ability to capture and represent the unique articulatory characteristics of each phoneme [1]. which are relevant and highly descriptive features of speech waveforms [1]. Distinctive phonetic features (DPFs) are relevant and highly descriptive features of speech waveforms that have the remarkable ability to capture and represent the unique articulatory characteristics of each phoneme [1]. Which are relevant and highly descriptive features of speech waveforms [1]. A DPF vector is organized as a sort of binary elements that outlines the phonemes in terms of their articulatory and vocal properties [1]. /θ/ which is phonetically characterized as ‘‘unvoiced,’’ ‘‘fricative,’’ ‘‘interdental,’’ or ‘‘consonant.’’ a potential DPF vector of /θ/ can be pointed out as voiced−, consonant+, fricative+, interdental+. Since languages differ in terms of their DPF elements, DPFs are not neutral but a language-dependent. A. BACKGROUND ON DPF ELEMENTS Phonemes are uttered by realizing the relevant DPF elements in coordination between the speaker’s brain and vocal system [2].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.