Abstract

A grapheme-to-phoneme conversion (G2P) is very important in both speech recognition and synthesis. The existing Indonesian G2P based on pseudo nearest neighbour rule (PNNR) has two drawbacks: the grapheme encoding does not adapt all Indonesian phonemic rules and the PNNR should select a best phoneme from all possible conversions even though they can be filtered by some phonemic rules. In this paper, a modified partial orthogonal binary grapheme encoding and a phonemic-based rule are proposed to improve the performance of PNNR-based Indonesian G2P. Evaluating on 5-fold cross-validation, contain 40K words to develop the model and 10K words to evaluation each, shows that both proposed concepts reduce the relative phoneme error rate (PER) by 13.07%. A more detail analysis shows the most errors are from grapheme ?e? that can be dynamically converted into either /E/ or /??/ since four prefixes, ’ber’, ’me’, ’per’, and ’ter’, produce many ambiguous conversions with basic words and also from some similar compound words with both different pronunciations for the grapheme ?e?. A stemming procedure can be applied to reduce those errors.

Highlights

  • A phonemization or letter-to-sound conversion, more commonly known as grapheme-to-phoneme conversion (G2P), is an important module in both speech recognition and speech synthesis

  • The phonemic rule filters some potential conversions to be selected by pseudo nearest neighbour rule (PNNR), for instance the first grapheme ⟨a⟩ followed by ⟨b⟩ in the given grapheme sequence ⟨abai⟩ is possible to be converted into either /A/ or /A+P/

  • This paper will discuss how to use PNNR to develop the Indonesian G2P, the proposed modified partial orthogonal binary grapheme encoding and the phonemic rule-based phoneme filtering, the experimental results showing the performance of both proposed concepts, and the conclusion

Read more

Summary

INTRODUCTION

A phonemization or letter-to-sound conversion, more commonly known as grapheme-to-phoneme conversion (G2P), is an important module in both speech recognition and speech synthesis. A G2P is developed using machine learning-based methods, such as instance-based learning [1], table lookup with defaults [1], self-learning techniques [2], hidden Markov model [3], morphology and phoneme history [4], joint multigram models [5], conditional random fields [6], Kullback-Leibler divergence-based hidden Markov model [7] These methods are commonly very complex and designed to be language independent, but they give varying performances for some phonemically complex languages, such as English, Dutch, French, and Germany. It not possible to be pronounced as /aU/ if it is not followed by ⟨u⟩ nor ⟨w⟩ Such phonemic rules can be used to filter possible conversions so that PNNR can convert a grapheme into a correct phoneme more accurately and faster. The phonemic rule filters some potential conversions to be selected by PNNR, for instance the first grapheme ⟨a⟩ followed by ⟨b⟩ in the given grapheme sequence ⟨abai⟩ is possible to be converted into either /A/ or /A+P/. The PNNR decides the best conversion of each given grapheme into the possible phonemes

Data Preprocessing
Modified Grapheme Encoding
Phonemic Rule-based Phoneme Filtering
Pseudo Nearest Neighbour Rule
Optimum Parameters
EXPERIMENTAL RESULTS
Modified Grapheme Encoding and Phonemic Rule
CONCLUSION
Most Errors

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.