Abstract
This paper describes an effect of articulatory dynamic parameters (Δ and ΔΔ) on neural network based automatic speech recognition(ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features- based in ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises three stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, (ii) incorporation of dynamic parameters into another MLN for reducing DPF context, and (iii) addition of an Inhibition/Enhancement (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt (GS) orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with hidden Markov model (HMMs)-based classifier. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher recognition performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.