Abstract

Problem statement: In order to build an utterance training system for Indonesian language, a speech recognition system designed for Indonesian is necessary. However, the system hardly works well due to the pronunciation variants of non-nativ e utterances may lead to substitution/deletion erro r. This research investigated the pronunciation varian t and proposes acoustic model adaptation to improve performance of the system. Approach: The proposed acoustic model adaptation worked in three steps: to analyze pronunciation variant with knowledge-based and data-derived methods; to align knowledge-based and data-derived results in order t o list frequently mispronounced phones with their variants; to perform a state-clustering proce dure with the list obtained from the second step. Further, three Speaker Adaptation (SA) techniques were used in combination with the acoustic model adaptation and they are compared each other. In ord er to evaluate and tune the adaptation techniques, perceptual-based evaluation by three human raters i s performed to obtain the true recognition result s. Results: The proposed method achieved an average gain in Hit + Rejection (the percentage of correctly accepted and correctly rejected utterances by the s ystem as the human raters do) of 2.9 points and 2 p oints for native and non-native subjects, respectively, w hen compared with the system without adaptation. Average gains of 12.7 and 6.2 points for native and non-native students in Hit + Rejection were obtain ed by combining SA to the acoustic model adaptation. Conclusion/Recommendations: Performance evaluation of the adapted system demonstrated that the proposed acoustic model adaptation can improve Hit even though there is a slight increase of False Ala rm (FA, the percentage of incorrectly accepted utterances by the system of which the human raters reject). The performance of the proposed acoustic model adaptation depends strongly on the effectiven ess of state-clustering procedure to recover only i n- vocabulary words. For future research, a confidence measure to discriminate between in-vocabulary and out-vocabulary words will be investigated.

Highlights

  • In recent years, there is an increase interest of foreign students to study in Indonesia especially on Indonesian language and local culture

  • It is shown that for the baseline system adapted with Acoustic Model Adaptation (AMA), the Hit + Rejection increases 2.0 points (68→70%) over the baseline system when evaluated on non-native students

  • Some experiments conducted for 500 non-native utterances yielded quite fair correct acceptance rates, Hit (66.9, 64 and 70.2% respectively) for very beginner level students

Read more

Summary

INTRODUCTION

There is an increase interest of foreign students to study in Indonesia especially on Indonesian language and local culture. In line with the main idea of some published works, the proposed acoustic model adaptation works as follows: frequently mispronounced phones with their pronunciation variants of non-native subjects are analyzed by performing alignment analysis between knowledge-based and data-derived results. Knowledge-based method utilizes human raters to carry out phonetic analysis between Indonesian language and non-native language. Presence of human raters in the proposed acoustic model adaptation is necessary in order to provide a standard evaluation against recognition results of the system, as mentioned in (Neumeyer et al, 1996; Franco et al, 1997). Performance of the proposed acoustic model adaptation is evaluated in five measures of alignment analysis between recognition results and perceptual based evaluation: Hit, False Alarm (FA), Miss, Rejection and Hit + Rejection

MATERIALS AND METHODS
RESULTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call