Abstract

Converting between Phonetics transcriptions and Myanmar text is a process of converting between the sequence of Phonetics transcriptions and Myanmar text. Phonetics transcription is based on the pronunciation of the language and the Myanmar text is based on the written language. One Phonetics alphabet can be represented many possible forms in written language that leads into word sense ambiguity problem. Another problem is that both of the Phonetics transcriptions and Myanmar text have no space to identify the boundary of syllables and words. This problem can be defined as segmentation problem for matching and mapping between Phonetics transcriptions and Myanmar text. To solve the word-sense ambiguity problem, the research developed n-grams language models from correct training data in Myanmar language. By using these trained n-grams language models, the system can be converted from Phonetics to Myanmar text. Instead of computing the probability on the trained n-grams data, the system matched the input data and the trained n-grams model data. The system has built n-grams models where unigram model, bi-grams model, trigrams model, 4-grams models and 5-grams models to train and convert between Phonetics and Myanmar text. To solve the segmentation problem, the system needed to break the input text into individual tokens. In the system, each token may be represented the consonant, or consonant clusters or vowels. To segment the input text Myanmar text or Phonetics transcriptions correctly, the proposed used the Unicode fonts for both Myanmar text and Phonetics transcriptions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call